Data Sharing Plan
Scientific data management and sharing at the CCME will meet the requirements and spirit of the NOAA Plan for Increasing Public Access to Research Results (NOAA 2015). The CCME recognizes the importance of not just sharing data publicly, but sharing it quickly and effectively. This is particularly true when developing information products for coastal stakeholders to aid in decision making. Transparency regarding underlying data on which recommendations are made is crucial for the public and for other researchers to assess results. Furthermore, sharing data early in the research process improves its documentation and quality as it is assessed and reused and improves the efficient use of research funds.
Prior Experience in Publishing Data
CCME principle investigator, James Gibeaut of the Harte Research Institute (HRI), in conjunction with the CCME Data Information and Communications Manager, will serve as the CCME’s Scientific Data Sharing Team. Dr. Gibeaut is the Director of the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC, https://data.gulfresearchinitiative.org/), which is managing the sharing of all research data resulting from the 10-year, $500 million research program. GRIIDC is advancing a data sharing culture in the Gulf of Mexico with the mission to ensure a data and information legacy that promotes continual scientific discovery and public awareness of the Gulf of Mexico ecosystem. The CCME will benefit from our GRIIDC experience and infrastructure as well as from GOMAportal (gomaportal.org), a geospatial data portal HRI developed and maintains for the Gulf of Mexico Alliance and is currently serving data from the 2011-2016 NOAA-Environmental Cooperative Science Center (ECSC). GOMAportal also exposes its metadata catalog for harvest by the National Centers for Environmental Information (NCEI).
Data Types
The CCME is a transdisciplinary endeavor that will use existing and newly acquired data to address research problems. Datasets involving socio-economic, biological, physical, chemical, remote sensing, and other coastal environmental observations as well as model output will be included. Decision support and outreach products will also be generated and made available through websites linked to the CCME website. Download links to the underlying datasets used in these information products will be provided.
CCME Data Availability Policy
Our goal is to make data and accompanying metadata publicly available within one year of acquisition or before publication or use in a publicly available decision support tool or outreach material. To accomplish this, the CCME will track datasets that are in development. CCME researchers will be required to identify datasets they plan to develop and fill out a Dataset Information Form (DIF) that provides basic information on the person responsible for the dataset, type of data, expected size, time of acquisition, methods, standards, security, and expected delivery date. This DIF, or collection of DIFs, will serve as the data management plan for the researcher and will be approved and filed with the Data Information and Communications Manager, who will track progress. DIFs will be made publicly available.
Data Stewardship, Standards, Formats, and Content
Datasets will be in a variety of forms such as tabular data in csv format, GIS data in shapefiles, remote sensing and model output in various grid formats or Common Data Format (CDF). In some cases, the data may be stored in a proprietary format that is used by an instrument manufacturer and can only be accessed using proprietary software. In these cases, the data will be transformed to a common format although the dataset package may contain the raw unprocessed data as well. Dataset “packages” may contain different levels of processing of the raw data, but the goal will be to provide data that are reusable and can validate research outputs. In Table 5, data processing levels are described with an example of topographic lidar data. In some cases, proprietary level 0 data may not be reusable by other researchers and therefore may not be shared, particularly in the case of a very large dataset.
Levels of Data Processing and Sharing with Lidar Example
|
||||||||||
Early Post Acquisition |
Level 0 unprocessed data at full resolution |
Level 1 data calibrated and processed for relevant parameters in a structured, common format |
Level 2 derived products, statistics, or combined parameters |
Level 3 interpreted data |
||||||
Acquisition Type |
Examples |
TAA |
Examples |
TAA |
Examples |
TAA |
Examples |
TAA |
Examples |
TAA |
Remote sensing – topographic lidar |
Data coverage/flight lines |
1 |
Observables in proprietary format |
X |
X,Y,Z elevation points |
6 |
Digital elevation model |
10 |
Extracted features, e.g., shorelines |
12 |
Metadata used by the CCME will be ISO 19115-2 compliant in XML format. ISO 19115-2 metadata is extensible and will include methods, processing steps, and error analysis to describe the methods used to collect, process, and analyze data. The GRIIDC metadata editor (https://data.gulfresearchinitiative.org/metadata-editor-start) will be available to CCME researchers. Before publicly sharing, dataset packages, including the metadata, will be reviewed for completeness and compliance to ensure usability by others. Once the dataset package is approved, HRI will issue a dataset Digital Object Identifier (DOI) and provide a persistent link to the dataset landing page.
Providing Public Access, Preservation, and Security
Datasets will be deposited in the publicly accessible GRIIDC data repository and will be identified as a NOAA-CCME collection. GRIIDC has redundant systems at two sites and tape backups stored in a secure offsite location (Iron Mountain in San Antonio, Texas). By the end of this program, datasets will be copied to appropriate National Archives, primarily the NCEI, to increase exposure and ensure long-term preservation. Dr. Julie Bosch, NCEI’s Gulf of Mexico Regional Science Officer, is our liaison for data transfers.