Meet secondees

Mindy Munoz

Mindy Stephania Muñoz

PhD student
Home institute: University of São Paulo
Country of residence: Brazil
Highest qualification: PhD
Name of the host: Juan Antonio Vizcaino
Projects start date:
Relevant challenge area(s): Communicable disease, Sustainable food production, Protection of biodiversity
Projects end date:
Type of project: Software curation

Mindy Stephania Muñoz is a Ph.D. student in the Bioinformatics Graduate Program at the University of São Paulo, Brazil, in the Institute of Mathematics and Statistics. After she obtained her Bioinformatics BSc, she worked at the Catholic University of Chile with Vitis vinifera improvement research, at the Center of Genomic and Bioinformatics of the Mayor University with public and private genomic data and at the Institute of Nutrition and Food Technology (INTA) at the University of Chile applying the knowledge in genomics and transcriptomics to assist food and crop breeding.  She joined the Computational Systems Biology Laboratory led by Dr. Helder Nakaya, with the aim of integrating omics data and a specific focus on human cancer. Proteomics is a challenge in this area for her and the EMBL-EBI presents a formidable opportunity to learn about the analysis and repository’s procedures for proteomics datasets and metadata.

Project summary

The PRIDE PRoteomics IDEntifications (PRIDE) database is a centralized, standards-compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and spectral evidence. Data re-use and result validation are firmly embedded in its core feature of data dissemination. Its data services have proven pivotal to many studies performing data reanalysis in a wider context than their original publications. 
In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. Quality Control (QC) provides a way to improve the experimental reliability, reproducibility and level of consistency in proteomics analytical measurements, which is important for datasets to be stored in any repository. Tools developed and used for QC and re-analysis need to be subject to the same level of care and control to be part of a repository’s procedures. Containerisation, unit tests and CI/CD (continuous integration/continuous deployment) are important tools to make sure that software and pipelines are working to specification. Firm understanding of the connection of data within a repository and its datasets are essential for successful re-analysis. Therefore, learning to handle data according to its metadata and if necessary, improve data annotation and stratify present metadata will be part of the goal she will focus on. 

Project outcomes and impact

TBA