Oberwolfach Seminar: Bioinformatics Approaches for Finding cis-regulatory Motifs and Modules

October 9th - 15th, 2005
Wolfgang Huber, Cambridge, UK
Xiaole Shirley Liu, Cambridge, Mass.
Terry Speed, Melbourne/Berkeley
Although (almost) every human cell contains a complete genome, and so has the capacity, in principle, to express every gene, cells at different stages in human development, or in different tissues, or in the same tissue under different conditions, typically exhibit very different patterns of gene expression. The process leading to a gene being transcribed in a cell is a complex one, involving many proteins sending and receiving signals, which, under certain circumstances, lead to the initiation of the transcriptional machinery at the gene's transcription start site (TSS). This process involves one or more proteins called transcription factors (TFs), together with what are called co-factors, binding to the genome "near" the TSS, at places called binding sites (TFBSs). This aspect of the process is called cis-regulation and the proteins involved are called regulatory elements. The general theme of this seminar is finding these regulatory elements and the associated binding sites. In doing so, we make use of cis-regulatory motifs and modules (i.e., clusters of motifs), these being regions of the genome which interact with the transcription machinery. A variety of types of genomic data and a wide range of computational methods have been brought to bear on this problem, and this seminar will review them. The data includes gene expression microarray data, chromatin immunoprecipitation (ChIP) data, and genomic sequence data for one or more species. In this context, finding includes "detecting", when the nature of the motif is known, and "discovering", when the motif is not known, but needs to be inferred. When we speak of "finding" clusters of motifs, there are 3 options: detecting clusters of known motifs of known form, discovering novel clusters of known motifs, and discovering novel clusters of novel motifs. In carrying out the task, a wide range of statistical methods have been found to be fruitful, as has the biological notion of evolution, more specifically, of comparative genomics. Our approach will be to explain the biological problem and the technology leading to the data, and then turn to the mathematical, statistical and computational methods involved in doing the "finding". The statistical topics we plan to discuss in this context include:
Lecture Program:
Theme: Background and motivation for the week.
  1. Introduction to molecular biology
  2. Microarrays and low level analysis
  3. Transcriptional regulation in eukaryotes
Computer Lab: cDNA and Affymetrix microarray analysis Spellman et al, 1998. Mol Bio Cell. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Cho et al, 1998. Mol Cell, A genome-wide transcriptional analysis of the mitotic cell cycle.

Theme: Motif and module representation and detection
  1. Introduction to motif representation and detection
  2. Profile and other HMMs
  3. Representation and detection of modules
Computer lab: Motif-detection (as distinct from discovery) programs. Retrieve intergenic sequences, run web motif finders. Paper: Tompa et al. 2005 Nat Biotech. Assessing computational tools for the discovery of transcription factor binding sites.

Theme: Motif discovery - sequence-based methods
  1. Methods using the EM-algorithm
  2. Methods using Gibbs samplers
  3. Methods for the discovery of modules

Theme: Motif discovery - using expression and ChIP data
  1. Linear models to discover cis-regulatory modules
  2. Analysis of ChIP-chip experiments
  3. MDScan and Motif Regressor
Computer lab: Revisiting the yeast cell-cycle data and yeast ChIP-chip data for RAP1. Illustration of algorithms from Wednesday and Thursday mornings.

Theme: Bringing everything together AM
  1. Motif finding in higher eukaryotes: comparative genomics
  2. Motif clusters and cis- regulatory modules
  3. Human ChIP-chip on tiled arrays
Discussion of papers: Wasserman et al. 2000 Nat Genetics. Human-mouse genome comparisons to locate regulatory sites. Carroll et al. 2005 Cell. Chromosome-Wide Mapping of Estrogen Receptor Binding Reveals Long-Range Regulation Requiring the Forkhead Protein FoxA1. Gupta and Liu. 2005 PNAS. De novo cis-regulatory module elicitation for eukaryotic genomes.
Participants should probably have a basic knowledge of the biology of gene expression and gene regulation, although this will be briefly reviewed. Also the basics of statistics and probability, including stochastic process will be assumed.
Deadline for applications:
September 1, 2005

The seminars take place at the Mathematisches Forschungsinstitut Oberwolfach. The number of participants is restricted to 24. The Institute covers accommodation and food. Travel expenses cannot be reimbursed. Applications including

should be sent as hard copy or by e-mail (.ps or .pdf file) to:

Prof. Dr. Gert-Martin Greuel
Universität Kaiserslautern
Fachbereich Mathematik
Erwin Schrödingerstr.
67663 Kaiserslautern, Germany

Mathematisches Forschungsinstitut Oberwolfach   updated: November 3rd, 2004