Data Sharing

Main navigation

Data Sharing

This section contains a step-by-step guide towards data sharing as well as the various services available to support this, and their respective strengths. Most are available to Neuro members at no cost with minimal implementation or technical knowledge. A summary table completes this section as well as example use-cases.听

identify what can be shared

1. Identify what to share and how听

It should first be decided what data to share and under which degree of restrictions. Researchers should decide whether to share raw or processed data, based on what is needed to reproduce published findings, what would be most useful to other researchers, and based on practical considerations (e.g., data size).听听

Additionally, ethical considerations are paramount in the case of data from human participants. Generally, data may only be shared with participants鈥� consent, as approved by the Research Ethics Board overseeing the project. The consent forms and approved protocols will provide further information regarding how the data may be accessed. The data will fall in one of three access categories:听

Fully Open Access: no听restrictions and minimal conditions are set to accessing the data.

Registered Access: enables the owner to define a set of conditions, such as licenses the requester must agree to. This may for example include anonymized, non-identifiable clinical data.听

Controlled Access: the strictest tier used most often for sensitive human data (e.g. genomics). Requests are reviewed by a committee before sharing.听

A single dataset may contain different data types made available under different modes of access. The mode(s) of access needed for your data will affect your choice of a data repository (Step 2).听

2. Identify a repository to share the data听

Identifying an appropriate repository can be done using the list below, with the access type determined in Step 1 and based on the specifics of the dataset (e.g., size).听

The focus is given here to services available to Neuro researchers, with servers located in Canada as is often a requirement for the sharing of human data. When ethical and privacy considerations are not a concern, for instance in the case of data from animal models, several additional options are available (Zenodo, Figshare, Dryad, etc).听

Resources to learn more about data repositories: 听 and The Harvard data management repository .听

	The听听(CONP) is Brain Canada funded platform created to support the sharing of Neuroscience datasets.听
	The听听(FRDR) is the Digital Research Alliance of Canada data-sharing and preservation platform.听
	The听好色TV Dataverse听is a 好色TVl Library service for data sharing based on the Open Source听听and connected听with other Canadian Universities repository through听.听
	The听Clinical Biospecimen, Imaging and Genetic repository听is a platform and patient registry听offering biobanking services, equipment samples and patient-information processing and management.听

听

The following table presents each service in greater detail, including the storage size, type of data and dataset accepted.听

听	Data size听	Dataset characteristics听	Access听	Additional information听
CONP []听	50Gb, up to a few Tb with approval听	Static, Publication-ready听	Open听	Can also share tools and pipelines; Integrated with LORIS; Datalad¹ support.听Ideal for larger datasets deposited by users with technical knowledge.
FRDR []听	1Tb, up to a few Tb with approval听	Static, Publication-ready	Open听	Allows 鈥渃ollections鈥� for grouping datasets; Globus² support; DOI generation.听Great for larger datasets deposited by users with no technical knowledge.
Dataverse听 []听	Default of 20Gb听	Static, Publication-ready	Open and听 Registered听	User-friendly听sharing platform; DOI generation.听Excellent for sharing smaller datasets to the widest audience.
C-BIG听 [request]听	Project specific听	Dynamic, Ongoing cohorts, Sharing while collecting	Any听	Biobanking; ready-made consent form and ethics approval. Tiered-access model.听Optimal for sharing patient-derived data from ongoing cohorts and during the course of the project, in a tiered access model.

¹ is a framework that allows a dataset to be distributed (i.e. aggregated from multiple places) but shared through a single link. 听

² is a software that allows non-technical users to perform easy and optimized data transfer between supported entities such as DRAC听

3. Prepare the dataset听

Dataset preparation involves multiple steps that will vary depending on the context of the research. Generally speaking, these will include:听

De-identification of the data: Data from human participants should be stripped of any direct identifiers, or potentially identifying features (e.g., face from MRI scans). Indirect identifiers, especially when present in combinations, might lead to re-identification. They should be evaluated carefully, and modified or removed if needed. For further information on de-identification, see .听

Add information about the data (meta-data) by creating data descriptors and documentation (meta-data): Data dictionary, 鈥淩EADME鈥� file, etc.听

Organize and convert data files according to modality-specific data standards, when applicable (e.g. ). Some harmonization tools exist to make this process easier (e.g. for Neuroimaging and clinical data)听
In line with the FAIR principles, shared data should be in open formats as much as possible. In some cases, this will include providing data in multiple formats, so that it can be processed by computers and used by people.听

4. Deposit the data听

Depositing data in your chosen repository is generally straightforward, through a process varying between repositories. Common steps are:听

Entering additional high-level meta-data, enhancing findability.听

Choosing a license for your data which ensures clarity on what others can and can鈥檛 do with it. Depending on the details of participant consent form and the approved research ethics protocol, you may need to write a custom license with a specific set of conditions.听

The most commonly used licenses in the context of open research data are the Creative Commons:听听

CC0: This license lets others distribute, remix, adapt, and build upon your work, even commercially, without violating copyright.听

CC BY: This license lets others distribute, remix, adapt, and build upon your work, even commercially, as long as they credit you for the original creation.听

CC BY-NC: This license lets others remix, adapt, and build upon your work non-commercially, as long as they credit you for the original creation. Derivative works, however, may be distributed under different terms.听

CC BY-SA: This license lets others remix, adapt, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms.听

CC BY-NC-SA: This license lets others remix, adapt, and build upon your work non-commercially, as long as they credit you and license their new creations under identical terms.听

has created a tool to support users new to licenses. For more information about Creative Commons licenses, contact 好色TVl鈥檚 copyright [at] mcgill.ca (Copyright and Digital Collections Librarian).听

5. Cite your data and get cited听

Research data should be recognized as a valuable output and cited appropriately, much like a publication. Proper data citation ensures you receive credit for your work and acknowledges others when reusing open data. When sharing a dataset supporting a publication, cite it directly where relevant in the publication, include it in the references list, and link to it in the Data Availability Statement (if applicable). Citing data in the references and including a persistent identifier ensures proper aggregation of citations.听听听

When reusing open datasets, always cite the dataset itself in the reference list, including its persistent identifier (e.g., DOI). Avoid citing only the publication describing the dataset; cite the dataset directly.听

Data sharing: the don'ts听

Including a statement such as 鈥淒ata available upon reasonable request鈥� in a published article is not considered data sharing best practices. Such requests are rarely successful (see ), and the availability of data is tied to the researcher/labs that generated it.听

Sharing data by providing links to common cloud services not made for long term storage and sharing (e.g., Google Drive, Dropbox), is not best practice. These services do not provide a persistent identifier for datasets and are not indexed by data search engines. Datasets stored that way can be moved, deleted or modified at any time by the data owner, breaking the link and losing access.听

GitHub is great to collaborate on code/software but is not meant to share data. It does not provide a persistent identifier.听

听

About this document

Unless otherwise indicated, all content on these pages is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Please attribute it to TOSI (the Tanenbaum Open Science Institute), this web page, and the contributors listed below.

Creative Commons

OPEN SCIENCE EVENTS

There are currently no events available.

The Neuro logo 听

听

The Neuro (Montreal Neurological Institute-Hospital)听is a bilingual academic healthcare institution. We are a听好色TVl research and teaching institute; delivering high-quality patient care, as part of the Neuroscience Mission of the 好色TV Health Centre.听We are听proud to be a Killam Institution, supported by the Killam Trusts.

听

Back to top

好色TV

Main navigation

Data Sharing

1. Identify what to share and how听

2. Identify a repository to share the data听

3. Prepare the dataset听

4. Deposit the data听

5. Cite your data and get cited听

Data sharing: the don'ts听

About this document

OPEN SCIENCE EVENTS

Department and University Information

The Neuro (Montreal Neurological Institute-Hospital)