Input identifiers
Identifier type

Identifier type


When providing a list of IDs or uploading a list, either:

Note that automatically inferring the type of IDs can lead to errors if the same ID is shared between multiple sources.

JSON files must be formatted according to the documentation (TODO INSERT LINK).


Processing may take up to several minutes depending on the size of the input.

StructRecon is a cheminformatics tool for automatically determining the structure of compounds from given database identifiers.

StructRecon traverses the cross-references between chemical databases in order to obtain a more complete view of the information available on any given compound. This allows StructRecon to resolve structures for database identifiers which may not directly contain the structure. In the case where the program finds multiple conflicting chemical structures, a random-walk based scoring algorithm is used to determine the most likely structure.

StructRecon is written in Python, and the source code is available at GitLab. It accompanies the paper:

Reconciling Inconsistent Molecular Structures from Biochemical Databases
Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle (2023)
DOI:978-981-99-7074-2_5, ArXiV Preprint
Lecture Notes in Computer Science (LNBI, vol 14248), Proceedings of the International symposium of Bioinformatics Research and Applications (ISBRA 2023)

If you choose to use StructRecon in your research, we kindly ask you to cite this paper.

This work is part of the MATOMIC project, sponsored by the Novo Nordisk Foundation grant NNF21OC0066551.


Novo Nordisk Foundation
Parameters
Confidence Threshold Defines the maximum ratio between the structures with the most and second-most confidence which allows automatically selecting the structure with the most confidence. A smaller value is more strict
Standardisation function order Define which standardisation functions to use and the order in which they are applied. Entered as a string containing the letter F, I, C, T, S, up to one time each.
Always apply standardisations Specify standardisation functions to apply always, even if a compound does not need to be standardised to have a consistent representation. Entered as a string containing the letter F, I, C, T, S, up to one time each.
Standardisation threshold A value between 0 and 1. When choosing whether to standardise a structure, StructRecon will not consider alternate structures with relative confidence less than this value.
Link deprecated IDs Link newer and older alternate IDs
Sources




Enable or disable each data source in the analysis.
Ignore ontology relations Ignore ontology relations in ChEBI and PubChem.
Draw identifier graphs Draw identifier graphs as .svg files
Draw molecule images Use RDKit to draw images of each molecule for visualisation purposes.
Draw confidence for all nodes Draw the confidence score for all nodes, not just standardised structures. By default, the relative confidence scores for the standard structures are shown. Enabling this parameter shows the absolute confidence score for all nodes.
Make statistical plots Make plots representing various statistics
Infer SBML ids Interpret compound names in SBML as identifiers. This may be necessary to disable depending on how the SBML file is generated.
Standardise charge to value in SBML file If the input is SBML and a charge is given, standardise to that specified charge.
Enforce charge from SBML file If the input is SBML and a charge is given, enforce that the chosen structure has the specified charge.
Enforce formula from SBML file


If the input is SBML and a formula is given, enforce that the chosen structure conforms to the specified formula.
Confidence ranking decay A value between 0 and 1, indicating the decay of the confidence score, as nodes get further from the source. A higher value more strongly favours closer nodes. Set to 0 to use the ranking algorithm without this modification.
Ranking weight of deprecation edges A value between 0 and 1, indicating the edge weight used when considering ID deprecation relations within a database.
Ranking weight of common names A value between 0 and 1, indicating the weight of edges from Common Name nodes. A lower value may reduce spurious correlations. Set to 0 to automatically exclude nodes only connected by common names.
Only process subset Set this value less that 1 to only sample a fraction of the input compoundss.This should mainly be used to get quicker processing times in order to adjustthe parameters.
Maximum exploration steps Maximum number of steps to traverse the identifier graph.
Show extra vertices Show vertices only related by ontological relations.
Explore child entries Explore entries linked by the 'is instance of' relation.Not recommended due to linking many unrelated entries.
Show inherited explicit properties Show properties marked as explicit in the ancestors of each structure.
Show generation Show the steps necessary to discover each structure.
Show support of structure vertices See paper for details.
Show preliminary candidates Show the vertices marked as preliminary candidates. See the paper for more details.