FAQ

General FAQ

BIOLOGICAL NETWORKS

The networks available on the website are STRING v11.0 Homo sapiens, STRING v10.0 Homo sapiens, and MeTeOR

Name	Network coverage	Notes	Citations
STRING v11.0 Homo sapiens	19,344 proteins	A protein-protein interaction (PPI) network for Homo sapiens genes/proteins.The interactions cover experimental data, computational predictions, and text mining.This is version 11 from STRING DB	Szklarczyk D., et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research, 47(D1), D607–D613.Link
STRING v10.0 Homo sapiens	19,236 proteins	A Homo sapiens PPI network. Its network information comes from similar sources like STRING v11.0. This is version 10 from STRING DB	Szklarczyk D., et al. (2015). STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic acids research, 43(Database issue), D447–D452.Link
MeTeOR	11,147 proteins 4,773 diseases 71,481 chemicals	A text mining multimodal network for genes, diseases and chemicals. It robustly captures biological knowledge by comprehensively aggregating co-occurrences of MeSH terms from 22 million MEDLINE publications up to year 2017.	Wilson S., et al. (2018) Automated literature mining and hypothesis generation through a network of Medical Subject Headings. bioRxiv, 403667, Link

WEBSITE

All you need to start is two lists of genes of interest. nDiffusion analysis allows you to answer the question of whether two groups of genes of interest are functionally related to each other, based on their connections to each other in a biological network. You may select any of the available networks on the website: STRING v11.0 Homo sapiens, STRING v10.0 Homo sapiens, and MeTeOR. You may also customize your networks. Please refer to the following presentation for detailed instructions

After submitting a job, you will be given a job ID. You may go to the job search page to enter with your job ID and retrieve results of your job. If you opt in for an email notification after submitting the job, you may follow the attached link to go straight to the result page.

You will see main results on the website page after entering the job ID. You can also download more detailed results, following “Download complete report” hyperlink

You may upload your own network following this format. Please make sure that the proteins/genes in your lists of interest are annotated by the same nomenclature with the proteins/genes in the uploaded network. If you choose a multimodal network, please also upload a file with a list of gene/protein nodes in the aforementioned network. Example

RESULT INTEPRETATIONS

We validate how well two gene groups are connected to each other by how well diffusion from one group recovers genes in another group. The predictive performance is measured by areas under receiver operating characteristic (AUROC). The diffusion values, i.e. diffusion signals that graph nodes other than seed genes receive after diffusion, are the ranking. Genes that receive more diffusion signals are predicted to be genes in the other group. AUROCs above 0.5 indicate that the two gene groups are significantly connected to each other more than uniformly random genes, i.e. regardless of their connectivity degrees

Z-scores are computed for the experimental area under ROC (AUROC) or PRC (AUPRC) based on distributions of the areas under the curves from random genes. On the main website result page, we are showing AUROCs and their respective z-scores when randomizing degree-matched genes, i.e. those with similar connectivity degrees with your genes of interest. Z-scores above 2.0 suggest that the two gene groups are significantly connected to each other more than random. We recommend degree-matched z-scores over uniform z-scores due to their stringency.

The two gene lists are significantly connected to each other in the network when AUROCs are greater than 0.5 and z-scores against random are greater than 2.0. The greater the AUROCs and z-scores are, the more significantly the two gene lists are connected to each other than random. This would suggest that the two gene lists are highly functionally related to each other.

Yes, it is possible. A large AUROC reflects that genes in two groups are overall well connected to each other. However, it is possible that one/some certain genes are the main source for the observed connectedness. In order to pinpoint the genes that contribute the most, please refer to “Diffusion score (Ranking)” in the files in the “ranking” folder after downloading the results. Diffusion scores reflect how much diffusion signal each network node receives after diffusing from the seed genes. The final column in these files allow you to figure out which of these network nodes are your genes of interest. Example

OTHER IMPLEMENTATIONS

You may upload a multimodal network, as long as proteins/genes in the network and in the lists of interest are annotated similarly. Please make sure to also upload a separate file that include all of the genes in your network. Example

The pipeline codebase is written in Python and is publicly accessible at https://github.com/mpham93/nDiffusion. If you have any question regarding the code, please contact Minh Pham (minh.pham@bcm.edu) or Olivier Lichtarge (lichtarge@bcm.edu).

CITING OUR WORK

You don’t need ask our permission. But we do appreciate if you could cite us (References)

References

Pham M, Lichtarge O. Graph-based information diffusion method for prioritizing functionally related genes in protein-protein interaction networks. Pac Symp Biocomput. 2020;25:439-450 (https://www.worldscientific.com/doi/10.1142/9789811215636_0039)

For the rationale and proof of concept of graph-based diffusion method to prioritize and validate functionally related genes in networks, please cite: PMID: 31797617 Pham M, Lichtarge O. Graph-based information diffusion method for prioritizing functionally related genes in protein-protein interaction networks. Pac Symp Biocomput. 2020;25:439-450 For using the website and its produced results, please cite: Bioinformtics (in preparation)