Friday, 24 April, 2026
Reference Database
The molecules_reference.db reference database is the shared data foundation that powers the entire molecular component of IsoFind. It contains 156 standardized molecules across 11 families, 50 tabulated degradation pathways, 56 isotopic fractionations, and 49 parent-metabolite relationships. This page describes the technical structure of this database, the distinction between the global reference and the user catalog, querying methods, and enrichment procedures.
General Structure
The reference database is a SQLite database, set to read-only from a project's perspective. It is distributed with the application and updated during IsoFind releases. Five tables structure the information, linked by foreign keys that ensure data consistency.
| Table | Role | Rows |
|---|---|---|
| ref_molecules | Main molecule catalog with metadata and thresholds | 156 |
| molecule_degradation_pathways | Degradation pathways with conditions, kinetics, metabolites | 50 |
| molecule_isotope_fractionation | Isotopic fractionations by pathway and element | 56 |
| molecule_metabolites | Parent-metabolite relations with yields | 49 |
| ref_molecules_isotopes | Isotopic interpretation by molecule and element | 18 |
Table ref_molecules: The 24 Fields
The main table carries 24 columns covering chemical identity, taxonomy, regulatory thresholds, analytical limits, and catalog management. Some fields are systematically filled, while others are optional depending on the molecule.
| Field | Type | Default | Filling Status |
|---|---|---|---|
| id | INTEGER | - | Auto-incremented primary key |
| nom | TEXT | - | Systematic (common name, mandatory) |
| nom_iupac | TEXT | - | Optional, 49% filled (80 nulls out of 156) |
| cas | TEXT | - | Systematic for catalog molecules |
| formule | TEXT | - | Systematic (chemical formula) |
| masse_molaire | REAL | - | Systematic (g/mol) |
| mz_principal | REAL | - | Systematic (main MS transition) |
| mz_secondaires | TEXT | - | 86% filled (comma-separated list) |
| famille | TEXT | - | Systematic (mandatory, 11 values) |
| sous_famille | TEXT | - | Systematic |
| type_polluant | TEXT | - | Organic (144) or inorganic (12) |
| niveau_acces | TEXT | 'gratuit' | free / pro / defense |
| seuil_eau | REAL | - | 77% filled (36 molecules without EU/EPA threshold) |
| seuil_sol | REAL | - | 24% filled (scarcely documented in soil regulations) |
| seuil_unit | TEXT | 'µg/L' | 149 µg/L, 4 pg/L (dioxins), 3 mg/L |
| reglementation | TEXT | - | Free text describing applicable frameworks |
| notes | TEXT | - | 74% filled (scientific comments) |
| unite_defaut | TEXT | 'µg/L' | 101 µg/L, 45 ng/L (PFAS, PAH), 6 pg/L, 4 mg/L |
| lod | REAL | - | Analytical Limit of Detection |
| loq | REAL | - | Limit of Quantification |
| methode_ref | TEXT | - | Applicable analysis standards (ISO, EPA, EN) |
| version_db | TEXT | '2.0' | Data version (126 in 2.0, 30 in 2) |
| actif | INTEGER | 1 | Logical deactivation flag, 156 active |
| created_at | TEXT | datetime('now') | Timestamp of record creation |
The actif flag allows a molecule to be removed from the visible catalog without being physically deleted, which preserves references of historical measurements to a deactivated molecule. All 156 current molecules are active.
Full Example: PFOA Data Sheet
Below is the exact content of a database record, illustrating the typical richness of a well-documented entry.
id = 2
nom = 'PFOA'
nom_iupac = 'Perfluorooctanoic acid'
cas = '335-67-1'
formule = 'C8HF15O2'
masse_molaire = 414.07
mz_principal = 413.0
mz_secondaires = '169.0,219.0,269.0,319.0,369.0'
famille = 'PFAS'
sous_famille = 'PFCA-C8'
type_polluant = 'organique'
niveau_acces = 'gratuit'
seuil_eau = 0.1
seuil_sol = 2.0
seuil_unit = 'µg/L'
reglementation = 'EU 2020/2184 (sum of 4 PFAS ≤0.10); REACH Ann.XVII banned manuf. 2020'
unite_defaut = 'ng/L'
lod = 0.001
loq = 0.005
methode_ref = 'ISO 21675:2019; EN 17892:2023; EPA 537.1'
version_db = '2.0'
actif = 1
nom = 'PFOA'
nom_iupac = 'Perfluorooctanoic acid'
cas = '335-67-1'
formule = 'C8HF15O2'
masse_molaire = 414.07
mz_principal = 413.0
mz_secondaires = '169.0,219.0,269.0,319.0,369.0'
famille = 'PFAS'
sous_famille = 'PFCA-C8'
type_polluant = 'organique'
niveau_acces = 'gratuit'
seuil_eau = 0.1
seuil_sol = 2.0
seuil_unit = 'µg/L'
reglementation = 'EU 2020/2184 (sum of 4 PFAS ≤0.10); REACH Ann.XVII banned manuf. 2020'
unite_defaut = 'ng/L'
lod = 0.001
loq = 0.005
methode_ref = 'ISO 21675:2019; EN 17892:2023; EPA 537.1'
version_db = '2.0'
actif = 1
Several elements are noteworthy: the threshold unit is in µg/L, but the default display unit is ng/L because laboratories report PFAS in ng/L for readability. IsoFind automatically normalizes these two units for comparison against the threshold. Secondary MS transitions are stored as comma-separated text for flexibility, and three method standards are cited together to cover differing practices between US and European labs.
Reference vs. User Catalog
IsoFind distinguishes between two storage levels for molecules: the shared reference database (ref_molecules) and the user catalog specific to each project (user_molecules). This distinction is fundamental to the IsoFind data model.
| Aspect | ref_molecules (Reference) | user_molecules (User Catalog) |
|---|---|---|
| Status | Read-only, distributed with IsoFind | Editable, project-specific |
| Scope | Common to all projects | Isolated per project |
| Evolution | Updated during IsoFind releases | Controlled by the user |
| Deletion | Impossible, deactivation via "actif" flag | Possible at the project level |
| Referenced by Measurements | No (no direct link) | Yes (foreign key molecule_id) |
To use a reference molecule in a project, it must be explicitly imported into user_molecules. This one-time copy allows the user to locally adjust thresholds or LOQs without impacting other projects. The dedicated endpoint for this operation is POST /api/molecules/reference/{ref_id}/importer.
The import detects duplicates by CAS number: if the molecule already exists in the user catalog with the same CAS, the import returns the existing ID without creating a duplicate. This behavior protects against accidental multiple imports but does not exclude duplicates introduced manually with distinct names and different CAS numbers.
Reference Module Endpoints
Six endpoints expose the reference database to client applications, all under the prefix /api/molecules/reference/.
| Method | Path | Usage |
|---|---|---|
| GET | /reference/catalogue | Catalog filtered by family, access_level, text search |
| GET | /reference/familles | List of the 11 families with counts and access levels |
| GET | /reference/{ref_id} | Full data sheet of a reference molecule |
| GET | /reference/{ref_id}/isotopes | Associated isotopic data (CSIA and interpretations) |
| POST | /reference/{ref_id}/importer | Copies a molecule to the project's user_molecules |
| POST | /reference/importer-batch | Batch import using a list of identifiers |
The catalog endpoint accepts three optional parameters: famille (exact filter), niveau_acces (free / pro / defense), and q (text search on name, CAS, formula). The default limit is 200 molecules per request, adjustable via limit.
The Families Endpoint
The /reference/familles endpoint is useful for populating navigation interfaces. It returns the list of families with their count and breakdown by access level, allowing the UI to display the number of available molecules per family based on the current license.
| Family | Molecules | free / pro / defense |
|---|---|---|
| Pesticides | 38 | Distributed across molecules |
| PFAS | 26 | 23 / 3 / 0 |
| Pharmaceuticals / EDs | 21 | Mixed |
| PAHs | 19 | 1 / 18 / 0 |
| Chlorinated Solvents | 16 | 4 / 12 / 0 |
| Explosives | 12 | Significant defense share |
| PCBs | 9 | Mixed |
| Perchlorates | 4 | Free |
| Dioxins / Furans | 4 | Pro |
| Cyanides | 4 | Free |
| Inorganics (oxyanions) | 3 | Free |
| Total | 156 | 67 / 77 / 12 |
Consistency and Validation
The reference database is verified at each IsoFind publication through a series of automated checks that ensure data consistency. These checks cover both structure and content.
- CAS Uniqueness: No duplicates allowed on the cas field for active molecules.
- Family Consistency: The famille field value must belong to the closed list of 11 official families.
- Threshold Plausibility: Values for seuil_eau are bounded by physical limits (positive, less than 10 mg/L in µg/L equivalent).
- Foreign Keys: Molecules cited in molecule_degradation_pathways, molecule_isotope_fractionation, and molecule_metabolites must all exist in ref_molecules.
- Essential Metadata: Name, CAS, formula, molar mass, and family are mandatory.
Enrichment and Evolution
The database evolves by versions. The version_db field of each record indicates the version under which it was created or last modified. The current majority version is 2.0 (126 records), with 30 records still in version 2, representing older additions not yet retouched since the format migration.
Future enrichments focus on four identified axes.
| Enrichment Axis | Planned Examples |
|---|---|
| Molecular Catalog Extension | BTEX, C10-C40 hydrocarbons, phthalates, brominated flame retardants |
| CSIA Densification | CSIA fractionations for neonicotinoid pesticides, missing chlorinated solvents |
| Missing Degradation Pathways | Aerobic pathways for 4+ ring PAHs, aqueous PFAS photolysis |
| Soil Thresholds | Closing the gap for the seuil_sol field, currently only 24% filled |
User contributions to the reference database can be submitted to IsoFind SAS for inclusion in future releases. The procedure requires a verifiable bibliographic reference for each addition. Purely local enrichments remain stored in the project's user catalog without being uploaded to the shared repository.
Backup and Integrity
The reference database is a binary SQLite file. Its disk location depends on the IsoFind installation configuration. It is loaded at startup and cached for frequent queries. Accidental file corruption is detected at loading via integrity checks; in such cases, the molecular module returns empty lists with the flag ref_disponible: false rather than throwing an error that would block the application.
This silent degradation allows the user to continue working on existing measurements (which point to user_molecules) while a restoration of the reference database is performed. Restoration simply consists of replacing the file with the one provided by the installer; no data migration is necessary.
Further Reading
- Molecular Component: Overview of the 11 families and database navigation.
- Degradation Pathways: Details of the 50 pathways and their integration into the simulation engine.
- CSIA Isotopy: Structure of the CSIA bridge utilizing the database.
- IsoFind API: General endpoints beyond the molecules module.