Welcome to the Söding Group
The massively parallel sequencing technologies that are being developed have lead to a drop in sequencing costs by an order of magnitude every three years. This trend will continue and is leading to an increase by a factor of two per year in the number of protein sequences in public databases (20 million in mid 2009). Present day methods are reaching the limits of being able to organize and fully exploit this vast amount of evolutionary information.
Our first goal is to develop methods for sequence searching, sequence alignment, protein structure prediction, and functional annotation that can make use of the coming deluge of sequence data. We are developing novel search algorithms that aim to achieve two orders of magnitude speed-up. But we also need to better profit from the diversity of available sequences to improve the accuracy of alignments and the sensitivity to detect remote protein homologs, both of which present crucial bottlenecks for protein structure and function prediction. We apply these methods in collaborations, for large-scale genome analysis, and to study protein evolution, in particular the question how protein domains originated.
Our second goal revolves around methods for genome and promoter analysis to predict regulatory elements and to understand the molecular basis of transcription initiation and regulation of gene expression. We collaborate with experimental labs in the Gene Center to validate our results, to propose new experiments, and to assist in data analysis. We will also improve methods for multiple genome alignment in order to better delineate functional elements through their conservation signature, drawing heavily on the riches of comparative sequence data that will become available in the future.
In our work, we are pursuing an information-driven approach that aims to expoit available information as well as possible and to integrate information from heterogenuous sources. For this purpose, we are developing methods based on (Bayesian) statistical modeling and advanced machine learning techniques.