Wealth of data …
… collected in large field campaigns like EUREC4A (Stevens et al. 2021).
Wealth of data …
… and distributed among many research institutions.
How to … make data usable?
How to EUREC4A
Shareable analysis scripts in form of an executable online book …
… and in doing so, it becomes the hub of a larger ecosystem.
The technical ecosystem
Jupyter notebook - an online executable book
- book-like structure
- Markdown for narrative content, e.g. instrument details
- MyST for executable content, e.g. code examples
Taking the pain out of data access and distribution
- cataloging system: for listing datasets
- drivers: opening instructions
The EUREC4A intake catalog
EUREC4A intake catalog usage
import eurec4a
cat = eurec4a.get_intake_catalog()
ds = cat.dropsondes.JOANNE.level3.to_dask()
ds.plot.scatter("flight_lon", "flight_lat");
Accessing remote data: OPeNDAP
OPeNDAP provides software which makes local data accessible to remote locations regardless of local storage format.
- advantage: powerful server, allows access of data chunks
- disadvantage: error-prone
Acessing remote data: the Zarr library
Zarr is a format for the storage of chunked, compressed, N-dimensional arrays.
- data in form of blocks or chunks
- single files allow for parallel processing
Acessing remote data: IPFS - the InterPlanetary File System
IPFS is a distributed system for storing and accessing files, websites, applications, and data.
- datasets can be pinned on several distributed servers
- IPFS searches for your requested dataset and delivers it from any of the close and running servers
- data is identified by it’s content (CID)
Putting the puzzle back together
The social ecosystem
The social ecosystem
Thanks to all contributors!
More examples…
Highlights
- goes far beyond FAIR data: fosters analysis-ready cloud-optimized data (Abernathy et al., 2021)
- data is used efficiently - duplicates are avoided
- inclusion
- collaboration and knowledge transfer
- scalable to meet future data needs
Conclusions
- an openly accessible and executable online book makes campaign data tangible
- explanations about the available instruments, data and typical usage patterns help to get started with the data
- as hub of a larger ecosystem, data becomes visible, easily accessible, shareable and usable
- the book lives from, enables and stimulates collaboration and inclusion
- positive feedback loops stimulate the publication of accessible and understandable datasets in an analysis friendly way