Open Science Archive

An open-source, domain-agnostic scientific data archive.

Scroll
01

Operate a trusted data archive for your field.

OSA is open-source software for running a scientific archive: validated records, version history, and query APIs. Designed for long-lived, public-facing, AI-ready data.

Deploy it on your infrastructure, then add the domain logic that makes it valuable.

Scientific data archive illustration
02

A complete record lifecycle, built in.

Every serious archive needs the same core workflow. OSA provides it as a first-class system:

Deposition
Structured submission of data and metadata.
Validation
Rules at the boundary, so only valid records enter the archive.
Transformation
Domain-specific processing into derived artefacts and indexes.
Discovery
Query and export through stable APIs for downstream use.
Record lifecycle illustration
03

Customised for your domain.

OSA is intentionally domain-agnostic: it doesn't assume your schema or what "good data" means.

You define the scientific rules:

  • Validators that encode your standards and constraints
  • Transformations that generate derived data and searchable indexes
  • Policies for how records evolve over time, without losing provenance

OSA handles the archive machinery around it: lifecycle, versioning, and access.

Domain customisation illustration
04

Open-source, permissive, and built on a shared protocol.

OSA is licensed under Apache 2.0, so you can use it, extend it, and operate it with long-term control.

It's developed alongside the OSA Protocol, a community-governed specification for defining records, validation, and discovery across scientific archives.

Get involved

The specification is drafted and open for feedback. A reference implementation is being built.

Want to contribute code? Pick up an issue