The analysis of networks

The transcriptional network
From the early days of my own involvement in bioinformatics research, first focussing on transcriptional regulation, the idea of a transcriptional network was already obvious to me: any transcription factor is encoded by a gene, which itself is under the control of transcription factors. However, it took many years to come up with a set of data, documented in the TRANSFAC database, that was large enough to construct such a network. And still, rough estimates about the coverage of this network came up with a maximum of about 1% known TFBSs compared to what we might expect for, e.g., the human genome (Wingender et al., J. Biosci. 32, 169-180, 2006; PubMed 17426389).

More recently, high-throughput approaches such as ChIP-chip and ChIP-seq technologies provide significantly larger data sets, but may have their own pitfalls (see Kel et al., Genome Biol. 9, R36, 2008; PubMed 18291023). Also, our earlier attempts to enrich the experimentally known network by predicted TFBSs proved to be not sufficiently reliable.

For these reasons, we are presently pursuing the approach to combine "classical", e.g. pattern-based predictions with high-throughput data and comparative genomic analyses (see the related work for the Net2Drug project). Highly conserved, high-affinity predicted TFBSs, so-called "seed sites", seem to constitute a reliable transcriptional network (TN). After "paralogous expansion", i.e. copying of TF-target relations to all members of a (sub-)family of TFs, an extended TN is obtained, which can be filtered for known gene expression signatures to obtain tissue-specific TNs. These tissue-specific TNs exhibit interesting characteristics (Haubrock et al., BMC Syst. Biol. 6 Suppl. 2, S15, 2012; PubMed 23282021).

The gene regulatory network
Together with partners at the University of Nanjing (Prof. Jin Wang), we are presently attempting to expand the transcription network by adding posttranscriptional events, especially those exerted by microRNAs. In its start phase, this project was funded by the German Ministry of Education and Research as a German-Chinese scientific-technological cooperation. Since then, a combined network of predicted TF-target relations (including miRNA genes) and miRNA-target relations (including TF genes) was constructed and initially analyzed (Li et al., Bioinformatics 28, i509-514, 2012: PubMed 22962474). Very interestingly, it turned out the tissue-specific subgraphs showed quite distinct features.

The methodology of network analysis
To analyze the global and local features of the transcriptional and the gene regulatory network a well as the signaling and the metabolic network, we have computed a number of topological parameters of all these network. One of our main interests was to identify network components (nodes, edges, or subgraphs/motifs) that are of particular importance for the coherence of the network(s) under study. For this purpose, Anatolij Potapov from my department at UMG has devised a new parameter, the Pairwise Disconnectivity Index (PDI), which pinpoints the most critical network parts (Potapov et al., BMC Bioinformatics 9, 227, 2008; PubMed 18454847). Björn Goemann, who made his PhD in my department under Anatolij's supervision, extended the approach to topological network motifs (Goemann et al., BMC Syst. Biol. 3, 53, 2009, PubMed 19454001; Goemann et al., Genome Inform. 23, 32-45, 2009, PubMed 20180260). There is now a nice service available on our University server that calculates PDI values for across a submitted network, which is then visualized highlighting the most critical components (nodes, egdes, or patterns); we have called it DiVa (Disconnectivity Valuation)