NEWS

scholid 0.2.0 (2026-06-04)

New identifier types

The package now supports 20 identifier types (up from 7 in 0.1.1). Each type provides structural validation, normalization from URLs and labels, and extraction from free text via the existing is_scholid(), normalize_scholid(), extract_scholid(), classify_scholid(), and detect_scholid_type() APIs.

New types in this release:

ROR — Research Organization Registry iDs (checksum-validated)
RRID — Research Resource Identifiers
SWHID — Software Heritage persistent identifiers
OpenAlex — OpenAlex entity keys (W, A, S, …)
bibcode — SAO/NASA ADS bibliographic codes
ISNI — International Standard Name Identifier (compact form; hyphenated ORCID-shaped strings remain orcid)
ARK — Archival Resource Keys (ark:/NAAN/Name)
UniProt — UniProtKB accessions
refseq — NCBI RefSeq accessions (versioned)
sra — INSDC Sequence Read Archive accessions (SRR, SRX, SRP, …)
geo — NCBI GEO accessions (GSE, GSM, GPL, GDS)
bioproject — INSDC BioProject accessions (PRJNA, PRJEB, …)
assembly — INSDC genome assembly accessions (GCA_, GCF_, versioned)

Identifier definitions and validation rules are documented in the scholid_definitions vignette.

Internal improvements

Introduced a central identifier registry as the single source of truth for type names, classification order, extraction patterns, and per-type metadata.
Refactored per-type implementations to reduce duplication; exported APIs dispatch by naming convention (is_<type>, normalize_<type>, extract_<type>).
Optimized classify_scholid() and detect_scholid_type() to avoid redundant work when resolving types.

scholid 0.1.1 (2026-04-24)

Bug fixes

Tightened normalization and validation behavior for checksum-based identifiers.
Improved consistency between detection, normalization, and validation for ISBN, ORCID, DOI, PMCID, and arXiv identifiers.
Fixed several edge cases in identifier parsing and canonicalization.

scholid 0.1.0 (2026-02-13)

Initial release.