Open Science Archive

An open-source, domain-agnostic scientific data archive.

Scroll

Operate a trusted data archive for your field.

OSA is open-source software for running a scientific archive: validated records, version history, and query APIs. Designed for long-lived, public-facing, AI-ready data.

Deploy it on your infrastructure, then add the domain logic that makes it valuable.

A complete record lifecycle, built in.

Every serious archive needs the same core workflow. OSA provides it as a first-class system:

Deposition: Structured submission of data and metadata.
Validation: Rules at the boundary, so only valid records enter the archive.
Transformation: Domain-specific processing into derived artefacts and indexes.
Discovery: Query and export through stable APIs for downstream use.

Customised for your domain.

OSA is intentionally domain-agnostic: it doesn't assume your schema or what "good data" means.

You define the scientific rules:

Validators that encode your standards and constraints
Transformations that generate derived data and searchable indexes
Policies for how records evolve over time, without losing provenance

OSA handles the archive machinery around it: lifecycle, versioning, and access.

Open-source, permissive, and built on a shared protocol.

OSA is licensed under Apache 2.0, so you can use it, extend it, and operate it with long-term control.

It's developed alongside the OSA Protocol, a community-governed specification for defining records, validation, and discovery across scientific archives.

Get involved

The specification is drafted and open for feedback. A reference implementation is being built.

Want to contribute code? Pick up an issue