Submission Guidelines
Choosing a licence for your dataset
It is important that any data being shared is provided with a valid license that tells the user what they are allowed to use the data for. Licensing consideration for software and datasets are quite different and for this reason typical open source software licenses may not be suitable for sharing datasets.
Licence types for Machine Learning (ML) models also fall into a separate category and should be considered different to software and data licences.
Typical suitable licences for data:
- Creative Commons (default on Zenodo)
- Creative Commons Attribution Share-Alike 4.0 (CC-BY-SA-4.0)
- Creative Commons Attribution 4.0 (CC-BY-4.0)
- Open Data Commons
- Open Data Commons Attribution License (ODC-By-1.0)
- Open/Non-Commercial Government Licence
- Public Domain
Licencing can be complicated and if in need of assistance with selecting an appropriate licence, you should speak to an expert within your organisation. Also, your funding provider or organisation may have it's own requirements on what license can be used for sharing data.
TUSAIL Community
A Community has been created on the Zenodo Opens Science platform. This allows a simple way of organising related datasets into a single collection.
However, there is no restriction on where the dataset is shared from as it is the DOI of the dataset that is stored within the database.
Preparing a Submission
Enhancing Interoperability
Interoperability is one of the four pillars of FAIR data and there are several measures that can be used to ensure highly interoperable data:
- Use Machine Readable formats for dataset
- Structured data is preferable - easily defined and understandable
- Use open formats where possible
- Scientific formats like HDF5 and netCDF
- Ascii, csv, json
- Use sensible variable names i.e. force, torque are better than x,y
- Avoid spaces or special characters in variable names
- Don’t use commas as decimal separators in numbers
- These can easily be mistaken as csv files: Format numbers as 675454453.00 or 0.00007654
- Avoid image only datasets for quantitative results
- Store metadata in a file format that is both machine readable and human readable and never use proprietary formats for metadata files. Suitable formats:
- csv
- json
- xml
- yaml / yml
- toml
- For visualisation of datasets and possible sharing of DEM data, file formats such as .vtk/.vtu are well supported
Dataset Accompanying Metadata
The README File - Putting the R in FAIR
The README file is an important file that is distributed with your dataset. It is provided as a means to convey key information about the dataset and it's structure so that your dataset can easily re-used. It will typically contain a description of the dataset but will also provide information on how best to use the data, what tools were used in the preparation of the dataset, provenance (when and where it was collected/generated) and the license under which it is being shared.
The README file should be in plaintext format (NEVER use a proper format, although PDF can be acceptable) and should be well organised and human-readable. Markdown files (*.md) have become the default format for the README file in many repositories because they are human readable, but also support syntax that can easily be rendered online or to PDF.
Recommended content for a README file:
- Title for dataset
- Keywords
- Investigator / Contact person
- Collection/Generation Timeframe
- Methods of collection/generation
- Deviations from standards
- Description of dataset:
- Filename(s) - this can be sometimes created as a separate index file
- File structure
- etc.
- License
You may also want to include instructions on how to cite your dataset (the DOI and/or path to the repository) within the README file.
Note
Much of the content of the README file may also be stored as metadata on the repository but the dataset needs to include this information as it may be shared without people visiting the repository.