Welcome to KNUTS-DB

About KNUTS-DB

Databases specifying the nutrients and allergens present in multi-ingredient foods are required to explore the effect of food consumption on health outcomes accurately. The phytochemicals found in tree nuts have been associated with antioxidant, anti-inflammatory, antiproliferative, antiviral, chemopreventive and hypercholesterolaemic actions, all of which are known to affect the initiation and progression of several pathogenic processes. We have developed KNUTS-DB - a data-driven knowledge database for nuts - containing information on the chemical nutrients, molecular targets, pathways, disease associations, and clinically defined allergen properties of peanuts and tree nuts. The database contains data sets associated with extremely rich and diverse metadata on almonds, cashew, pecan, walnut, pistachio, peanut and walnut. Additionally, the database allows users to perform pathway and GO-Term enrichment analysis for user-defined chemical nutrients. Heatmaps and network plots can also be generated. The database can be searched by similarity search, interaction values, type of allergens, and specific nut types and can support the researchers in estimating nuts' chemical composition and associations between nut composition and health outcomes - with deeper insights into the molecular mechanisms.
KNUTS-DB is freely available to all users, without any login or registration.

Navigation

Please feel free to explore the database using the menu bar on the left side, or by clicking on the buttons below.















If there are any open question, please have a look at the FAQs and feel free to contact us.



Please Cite:

Emanuel Kemmler, Margitta Worm, Robert Preissner and Priyanka Banerjee.
KNUTS-DB - a data-driven knowledge database for NUTS.
Toxicol Mech Methods 2025 Apr 29:1-8. DOI: 10.1080/15376516.2025.2496752.

How to search protein interactions Search for specific protein-compound interactions. Please select the value type and cutoff. Optionally, the 'Nuts', 'Compounds', 'Proteins' and 'Pathway' fields can be used to further specify the query. Proteins can be searched by name or UniProt ID.

Search

Data

Loading...
Download data
How to search by similarity This page can be used to search for similar compounds. Enter your compound either by Name or by SMILES. Click on an entry of the result page to show the corresponding data set.

Search by similarity

Result

Nut compounds Data about nut compounds. Click on an entry to show the corresponding data set.

Nut ingredients

Number of allergens

Number of epitopes

Number of ingredients

Number of proteins with interaction

Number of pathways with interaction

Types of interaction values

'Full data set' Shows the full connection between nut compounds, proteins and pathways in one table. Click on entries for further information. For the actual interaction values, please have a look at the 'Protein interactions' page.

Full data set

Loading...
Download selected data
Protein interactions Data about protein interactions. Click on entries for further information. Interaction values were gathered from lab experiments. Interactions of type 'KEGG' are interactions approved by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. To account for this, the corresponding interaction values were set to 1.
Protein interactions Data about pathway interactions. Click on entries for further information.

Pathway interactions

User defined data set To build a custom data set, use the 'Add data to user data set' buttons on the 'Protein interactions', 'Pathway interactions' and 'Full data set' pages.

User selected data

Epitopes Information about nut epitopes. Click on entries for parent allergen(s) and the corresponding start and end positions.

Epitopes by Nut

GO-Term Enrichment Performed a GO-Term enrichment analysis. Select either from the complete data base or from the user selected data set. Specify the query through the different selection fields. Use 'Enrich GO-Terms' to show over-represented GO-Terms.

Input

Reset selection
to

Current selection:

Pathway Enrichment Performed a pathway enrichment analysis. Select either from the complete data base or from the user selected data set. Specify the query through the different selection fields. Set an interaction value range if needed. Use 'Enrich Pathways' to show over-represented Pathways.

Input

Reset selection
to

Current selection:

Enrichment settings:

Heatmap plot Generate a heatmap plot. Select either from the complete data base or from the user selected data set. Specify the query through the different selection fields. Set an interaction value range if needed. The coloring corresponds to the interaction values.

Input

Reset selection
to

Current selection:

Network plot Generate an interactive network plot. Select either from the complete data base or from the user selected data set. Specify the query through the different selection fields. Set an interaction value range if needed. Blue nodes correspond to compounds and orange nodes to proteins. The size of the nodes correspond to the number of interactions/connections and the thickness of connections to interaction values. For a better view it is possioble to zoom and to move nodes around.

Input

Reset selection
to

Current selection:

What is KNUTS-DB?

A data-driven knowledge database for Nuts containing information on chemical nutrients, molecular targets, pathways, disease associations and clinically defined allergen properties. The database contains data sets associated with extremely rich and diverse metadata on almond, cashew, pecan, hazelnut, walnut, pistachio and peanut.

Database configuration

The webpage is implemented as Shiny web-app written in the R programming language. It is hosted using a so-called shiny-server running on a Linux server within the Charité IT system. Shiny incorporates CSS definitions as well as javascript. The front end is implemented using the package “shinydashboard”. The Tables on the webpage were created with the package “DT”, an R-implementation of the well-known DataTables. The data is stored in a relational MySQL database. The database connection is handled through R using MySQL queries. For processing, depiction and data integration different R packages were used, most notably “tidyverse”, “ggplot2”, “UniprotR” and “webchem”. The web-app was tested on the most recent versions of Apple Safari, Google Chrome and Mozilla Firefox.

Allergens

Data on individual allergens for respective nuts were identified from different databases and literature sources, and data on additional allergenic proteins are also reported in this section. One can search the database using respective nuts, allergen names, scientific names, physicochemical properties, genebank nucleotide, genebank protein ids, and UniProt ids. The filtered data can then be downloaded. Selecting single entries reveals the amino acid sequence, information about allergen-specific epitopes and cross-references.

Nuts ingredients

The information about the nut composition was collected from different platforms: FoodData Central, fooDB, phenole explorer, Duke’s database and an in-house resource, superTCM. The data was then filtered for duplications by the InChI key and incomplete entries were removed. Additional data was obtained using an in-house text-mining pipeline using the PubMed query search and manual curation by domain experts. Furthermore, datasets were obtained and integrated from published literature sources. The chemical ingredients present in respective nuts can be searched using the name, IUPAC, SMILES, PubChem IDs, and protein interaction data. It is possible to click on an entry to show the corresponding data set, e.g. protein and pathway interactions.

Similarity search

The similarity search can be performt using either a name or a SMILES as query. If a name is used, the corresponding SMILES string is obtained from PubChem. The SMILES is than converted into a MACCS fingerprint and a Tanomoto similarity score for all compounds in the data base is calculated. The results are shown in a seperate table. It is possible to click on entries for further information about molecular functions, biological processes, cellular components and pathways of the protein.

Molecular targets

The protein interaction data were derived from KEGG, BindingDB and ChEMBL by matching the compound InChI keys to database-specific IDs and using web scraping techniques. Since ChEMBL and BindingDB contains also data about binding experiments without successful binding, the data was then filtered for well-recorded binding entries, to receive a reliable dataset. Entries with inconclusive activity comments or comments indicating no binding were filtered out as well. Using information from UniProt, the data was restricted to reviewed proteins only. Protein interaction data on nuts can be retrieved using the protein-interaction tab of the database including information on UniPot id, enzyme names, protein name, organism, taxonomy and by interaction type and value. It is possible to click on entries for further information about molecular functions, biological processes, cellular components and pathways of the protein.

User selected data set

The user-defined data set page enables the user to build an individual data set. Therefore it is possible to add data from the ‘Full data set’, ‘Protein interactions’ and the ‘Pathway interactions’ subpages. In the ‘User data set’ page it is then possible to inspect and download this custom data. Additionally some statistics of the current state are shown in the form of plots about the number of different compounds and protein and pathway interactions itemized by taxonomic kingdom and selected nut.

Analysis and graphical plotting

The platform allows the user to perform some basic exploratory data analysis. Therefore the user can choose data either from the whole data set of the platform or from the individual user defined data set. It is also possible to select data by interaction value type and to apply a threshold. In general, data can be selected by nut, compound, protein, pathway and interaction value type. The results of the analysis are returned as a plot and as an interactive data table. The pathway and gene ontology (GO-Term) enrichment analysis enables the user to identify over-represented pathways and GO-Terms in user-defined gene lists. Due to implementation related restrictions, the pathway and GO-term enrichment can only be performed for human proteins. The clustered heatmap plot is a valuable tool to inspect the compound protein relationship and identify interaction clusters, e.g. proteins that are affected by multiple compounds and vice versa. Hierarchical clustering with spearman distance is used and the coloring corresponds to the interaction value. The network plot is another way to inspect the compound protein relationship. Blue nodes represent compounds and orange nodes represent proteins. Compound and protein nodes are connected if there is an interaction. The size of the node corresponds to the number of distinct interactions and the thickness of the link to the interaction value (as thicker as lower the IC50, AC50, ki, ...). It is possible to zoom in and out and the nodes are movable to get a clear view on names and connections.

Case study

The part of the analysis that functioned as a starting point to this insight can now also be performed using our new platform. One possible approach is to filter for human pathways with interactions with peanut ingredients and a “Immune System” entry in the “Pathway Interaction'' page. Using “Add Data to user data set” a custom data set will be created which also includes the corresponding proteins and their interactions with nut ingredients. This data set is then selected in the “Network Plot'' page, restricted to IC50 values below 10,000nM to generate a network plot showing potential inhibitory influences (Figure 5). This plot reveals a cluster of fatty acids with a strong interaction with the Peroxisome proliferator-activated receptor alpha (PPAR-alpha), indicated by the thickness of the lines. It is now possible to inspect the corresponding data in the “Data”-tab of the “Network Plot” page, redistricting it to PPAR-alpha and to download the IC50 values if needed. For further information about the involved proteins, links to UniProt are provided. An overview about the pathways of PPAR-alpha can be found in the “Full data set” page using the corresponding columns to filter for PPAR-alpha, peanut and Homo sapiens. Here we can also find the Toll-Like Receptor Pathway 1 including further information and links.

Citation

Emanuel Kemmler, Margitta Worm, Robert Preissner and Priyanka Banerjee. KNUTS-DB - a data-driven knowledge database for NUTS. Toxicol Mech Methods 2025 Apr 29:1-8. DOI: 10.1080/15376516.2025.2496752.

PMID: 40269639

KNuts-DB Team

Institute for Physiology Structural
Bioinformatics Group
Philippstr. 12
10115 Berlin, Germany



Emanuel Kemmler emanuel.kemmler@charite.de

Priyanka Banerjee priyanka.banerjee@charite.de

Allergens Data about nut allergens. Click on entries for epitopes and further information.

Nut allergens