StructRecon

Input identifiers
	List of identifiers
	Identifier type


	Upload list of identifiers
	Identifier type


	Upload SBML file


	Upload JSON file

When providing a list of IDs or uploading a list, either:

Enter one identifier per line. Select the type using the drop-down menu, or let StructRecon automatically infer the type.
Enter one type and identifier per line, separated by space, e.g. "pubchem 31348"

Note that automatically inferring the type of IDs can lead to errors if the same ID is shared between multiple sources.

JSON files must be formatted according to the documentation (TODO INSERT LINK).

Processing may take up to several minutes depending on the size of the input.

StructRecon is a cheminformatics tool for automatically determining the structure of compounds from given database identifiers.

StructRecon traverses the cross-references between chemical databases in order to obtain a more complete view of the information available on any given compound. This allows StructRecon to resolve structures for database identifiers which may not directly contain the structure. In the case where the program finds multiple conflicting chemical structures, a random-walk based scoring algorithm is used to determine the most likely structure.

StructRecon is written in Python, and the source code is available at GitLab. It accompanies the paper:

Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases
Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle (2024)
DOI:10.1089/cmb.2024.0520
Journal of Computational Biology

If you choose to use StructRecon in your research, we kindly ask you to cite this paper.

This work is part of the MATOMIC project, sponsored by the Novo Nordisk Foundation grant NNF21OC0066551.

Parameters
Confidence Threshold		Defines the maximum ratio between the structures with the most and second-most confidence which allows automatically selecting the structure with the most confidence. A smaller value is more strict

Standardisation function order		Define which standardisation functions to use and the order in which they are applied. Entered as a string containing the letter F, I, C, T, S, up to one time each.

Always apply standardisations		Specify standardisation functions to apply always, even if a compound does not need to be standardised to have a consistent representation. Entered as a string containing the letter F, I, C, T, S, up to one time each.

Standardisation threshold		A value between 0 and 1. When choosing whether to standardise a structure, StructRecon will not consider alternate structures with relative confidence less than this value.

Link deprecated IDs		Link newer and older alternate IDs

Sources	BiGG ChEBI ECMDB MetaNetX PubChem	Enable or disable each data source in the analysis.

Ignore ontology relations		Ignore ontology relations in ChEBI and PubChem.

Draw identifier graphs		Draw identifier graphs as .svg files

Draw molecule images		Use RDKit to draw images of each molecule for visualisation purposes.

Draw confidence for all nodes		Draw the confidence score for all nodes, not just standardised structures. By default, the relative confidence scores for the standard structures are shown. Enabling this parameter shows the absolute confidence score for all nodes.

Make statistical plots		Make plots representing various statistics

Infer SBML ids		Interpret compound names in SBML as identifiers. This may be necessary to disable depending on how the SBML file is generated.

Standardise charge to value in SBML file		If the input is SBML and a charge is given, standardise to that specified charge.

Enforce charge from SBML file		If the input is SBML and a charge is given, enforce that the chosen structure has the specified charge.

Enforce formula from SBML file	Yes, ignore hydrogen Yes, enforce hydrogen No	If the input is SBML and a formula is given, enforce that the chosen structure conforms to the specified formula.

Confidence ranking decay		A value between 0 and 1, indicating the decay of the confidence score, as nodes get further from the source. A higher value more strongly favours closer nodes. Set to 0 to use the ranking algorithm without this modification.

Ranking weight of deprecation edges		A value between 0 and 1, indicating the edge weight used when considering ID deprecation relations within a database.

Ranking weight of common names		A value between 0 and 1, indicating the weight of edges from Common Name nodes. A lower value may reduce spurious correlations. Set to 0 to automatically exclude nodes only connected by common names.

Only process subset		Set this value less that 1 to only sample a fraction of the input compoundss.This should mainly be used to get quicker processing times in order to adjustthe parameters.

Maximum exploration steps		Maximum number of steps to traverse the identifier graph.

Show extra vertices		Show vertices only related by ontological relations.

Explore child entries		Explore entries linked by the 'is instance of' relation.Not recommended due to linking many unrelated entries.

Show inherited explicit properties		Show properties marked as explicit in the ancestors of each structure.

Show generation		Show the steps necessary to discover each structure.

Show support of structure vertices		See paper for details.

Show preliminary candidates		Show the vertices marked as preliminary candidates. See the paper for more details.