Reproducible Data Management

Information and resources for reproducible data management for the UCSF research community

Why publish your research data?

Many biomedical journals and funders now require researchers to make their research data publicly available when they publish their results. Journals like PLOS, PNAS, Science, and Nature require data sharing as a prerequisite to publication. Funders like Howard Hughes, NIH, and the Gates Foundation now require that grantees make data public alongside their articles. But what does it mean to publish your data? This page will give you an overview of the process.

Select data and documentation for sharing

What data will you share? This is largely dependent on your field of research, but you should consider what someone else in your field would need to validate your results

What documentation needs to accompany the data? Data by itself is seldom useful. What other dictionaries, metadata, code would someone need to use your data?

Ensure you have permission to share

Do you have consent to share?

If your research involves human subjects did you mention data sharing in your IRB and informed consent documents?

De-identify your data

Has your data been de-identified?

It is important to properly de-identify your data to reduce the risk of identifying individuals in a dataset. UCSF now offers a service to validate this.

Research data repositories

Public data repositories are the best place to publish your research data. Repositories preserve and archive your data and make it easy for others to find and cite your data. The best repositories are the ones specific to your discipline (especially those at NIH) because they are designed with your community in mind. That said, there are several general purpose data repositories that are also excellent.