ɫTV

Data Sharing

This section contains a step-by-step guide towards data sharing as well as the various services available to support this, and their respective strengths. Most are available to Neuro members at no cost with minimal implementation or technical knowledge. A summary table completes this section as well as example use-cases.

identify what can be shared

1. Identify what to share and how

It should first be decided what data to share and under which degree of restrictions. Researchers should decide whether to share raw or processed data, based on what is needed to reproduce published findings, what would be most useful to other researchers, and based on practical considerations (e.g., data size).

Additionally, ethical considerations are paramount in the case of data from human participants. Generally, data may only be shared with participants’ consent, as approved by the Research Ethics Board overseeing the project. The consent forms and approved protocols will provide further information regarding how the data may be accessed. The data will fall in one of three access categories:

  • Fully Open Access: norestrictions and minimal conditions are set to accessing the data.

  • Registered Access: enables the owner to define a set of conditions, such as licenses the requester must agree to. This may for example include anonymized, non-identifiable clinical data.

  • Controlled Access: the strictest tier used most often for sensitive human data (e.g. genomics). Requests are reviewed by a committee before sharing.

A single dataset may contain different data types made available under different modes of access. The mode(s) of access needed for your data will affect your choice of a data repository (Step 2).

2. Identify a repository to share the data

Identifying an appropriate repository can be done using the list below, with the access type determined in Step 1 and based on the specifics of the dataset (e.g., size).

The focus is given here to services available to Neuro researchers, with servers located in Canada as is often a requirement for the sharing of human data. When ethical and privacy considerations are not a concern, for instance in the case of data from animal models, several additional options are available (Zenodo, Figshare, Dryad, etc).

Resources to learn more about data repositories: and The Harvard data management repository .

canadian open science platform

The(CONP) is Brain Canada funded platform created to support the sharing of Neuroscience datasets.

federal research data repository

The(FRDR) is the Digital Research Alliance of Canada data-sharing and preservation platform.

the mcgill dataverse

TheɫTV Dataverseis a ɫTVl Library service for data sharing based on the Open Sourceand connectedwith other Canadian Universities repository through.

c-big

TheClinical Biospecimen, Imaging and Genetic repositoryis a platform and patient registryoffering biobanking services, equipment samples and patient-information processing and management.

The following table presents each service in greater detail, including the storage size, type of data and dataset accepted.

Data size

Dataset characteristics

Access

Additional information

CONP []

50Gb, up to a few Tb with approval

Static, Publication-ready

Open

Can also share tools and pipelines; Integrated with LORIS; Datalad1 support.Ideal for larger datasets deposited by users with technical knowledge.

FRDR []

1Tb, up to a few Tb with approval

Static, Publication-ready

Open

Allows “collections” for grouping datasets; Globus2 support; DOI generation.Great for larger datasets deposited by users with no technical knowledge.

Dataverse

[]

Default of 20Gb

Static, Publication-ready

Open and

Registered

User-friendlysharing platform; DOI generation.Excellent for sharing smaller datasets to the widest audience.

C-BIG
[request]

Project specific

Dynamic, Ongoing cohorts, Sharing while collecting

Any

Biobanking; ready-made consent form and ethics approval. Tiered-access model.Optimal for sharing patient-derived data from ongoing cohorts and during the course of the project, in a tiered access model.

1 is a framework that allows a dataset to be distributed (i.e. aggregated from multiple places) but shared through a single link.

2 is a software that allows non-technical users to perform easy and optimized data transfer between supported entities such as DRAC

3. Prepare the dataset

Dataset preparation involves multiple steps that will vary depending on the context of the research. Generally speaking, these will include:

  • De-identification of the data: Data from human participants should be stripped of any direct identifiers, or potentially identifying features (e.g., face from MRI scans). Indirect identifiers, especially when present in combinations, might lead to re-identification. They should be evaluated carefully, and modified or removed if needed. For further information on de-identification, see .

  • Add information about the data (meta-data) by creating data descriptors and documentation (meta-data): Data dictionary, “README” file, etc.

  • Organize and convert data files according to modality-specific data standards, when applicable (e.g. ). Some harmonization tools exist to make this process easier (e.g. for Neuroimaging and clinical data)

  • In line with the FAIR principles, shared data should be in open formats as much as possible. In some cases, this will include providing data in multiple formats, so that it can be processed by computers and used by people.

4. Deposit the data

Depositing data in your chosen repository is generally straightforward, through a process varying between repositories. Common steps are:

  • Entering additional high-level meta-data, enhancing findability.

  • Choosing a license for your data which ensures clarity on what others can and can’t do with it. Depending on the details of participant consent form and the approved research ethics protocol, you may need to write a custom license with a specific set of conditions.

The most commonly used licenses in the context of open research data are the Creative Commons:

  • CC0: This license lets others distribute, remix, adapt, and build upon your work, even commercially, without violating copyright.

  • CC BY: This license lets others distribute, remix, adapt, and build upon your work, even commercially, as long as they credit you for the original creation.

  • CC BY-NC: This license lets others remix, adapt, and build upon your work non-commercially, as long as they credit you for the original creation. Derivative works, however, may be distributed under different terms.

  • CC BY-SA: This license lets others remix, adapt, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms.

  • CC BY-NC-SA: This license lets others remix, adapt, and build upon your work non-commercially, as long as they credit you and license their new creations under identical terms.

has created a tool to support users new to licenses. For more information about Creative Commons licenses, contact ɫTVl’s copyright [at] mcgill.ca (Copyright and Digital Collections Librarian).

5. Cite your data and get cited

Research data should be recognized as a valuable output and cited appropriately, much like a publication. Proper data citation ensures you receive credit for your work and acknowledges others when reusing open data. When sharing a dataset supporting a publication, cite it directly where relevant in the publication, include it in the references list, and link to it in the Data Availability Statement (if applicable). Citing data in the references and including a persistent identifier ensures proper aggregation of citations.

When reusing open datasets, always cite the dataset itself in the reference list, including its persistent identifier (e.g., DOI). Avoid citing only the publication describing the dataset; cite the dataset directly.

Data sharing: the don'ts

  • Including a statement such as “Data available upon reasonable request” in a published article is not considered data sharing best practices. Such requests are rarely successful (see ), and the availability of data is tied to the researcher/labs that generated it.

  • Sharing data by providing links to common cloud services not made for long term storage and sharing (e.g., Google Drive, Dropbox), is not best practice. These services do not provide a persistent identifier for datasets and are not indexed by data search engines. Datasets stored that way can be moved, deleted or modified at any time by the data owner, breaking the link and losing access.

  • GitHub is great to collaborate on code/software but is not meant to share data. It does not provide a persistent identifier.

About this document

Unless otherwise indicated, all content on these pages is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Please attribute it to TOSI (the Tanenbaum Open Science Institute), this web page, and the contributors listed below.

Creative Commons

OPEN SCIENCE EVENTS

There are currently no events available.

The Neuro logoɫTVl logo

The Neuro (Montreal Neurological Institute-Hospital)is a bilingual academic healthcare institution. We are aɫTVl research and teaching institute; delivering high-quality patient care, as part of the Neuroscience Mission of the ɫTV Health Centre.We areproud to be a Killam Institution, supported by the Killam Trusts.

Back to top