CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes, 2007, Bioinformatics, 23, 9, Genis Parra, Keith Bradnam, and Ian Korf The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes.
In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon–intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages.
Structure- based redesign of the dimerization interface reduces the toxicity of zinc-finger nucleases Nature Biotechnology, 25:786-793. Cover article
The Segal Lab, with collaborator Toni Cathomen at Charite´ Medical School in Berlin, Germany, describe an important advance in methods for editing the genomes of living cells. Mutagenesis and/or gene correction at a specific locus in the genome can be dramatically stimulated by creating a targeted double-strand break. Currently, the only way to make such a targeted break is by attaching a nuclease to engineered zinc finger DNA-binding proteins, which can be programmed to bind virtually any desired DNA sequence. These zinc finger nucleases (ZFN) show promise for making highly efficiency (up to 80%) targeted knock-outs in plants and animal models for which no knock-out methods currently exist. Similarly, high efficiency gene correction (up to 20%) shows promise as a new approach to gene therapy. The first zinc finger nuclease will enter clinical trial this year. However, first generation ZFN were frequently toxic to the cells, due to cleavages a off-target sites. By re-engineering the nuclease, we have dramatically reduced off target cleavage events. These modification should help to realize the potential of this gene editing approach. A similar study by a commercial developer of ZFN appears in the same journal.
Genome-Wide Analysis of KAP1 Binding Suggests Autoregulation of KRAB-ZNFs
Henriette O'Geen, Sharon L. Squazzo, Sushma Iyengar1, Kim Blahnik, John L. Rinn, Howard Y. Chang, Roland Green, Peggy J. Farnham
We performed a genome-scale chromatin immunoprecipitation (ChIP)-chip comparison of two modifications (trimethylation of lysine 9 [H3me3K9] and trimethylation of lysine 27 [H3me3K27]) of histone H3 in Ntera2 testicular carcinoma cells and in three different anatomical sources of primary human fibroblasts.We found that in each of the cell types the two modifications were differentially enriched at the promoters of the two largest classes of transcription factors. Specifically, zinc finger (ZNF) genes were bound by H3me3K9 and homeobox genes were bound by H3me3K27. We have previously shown that the Polycomb repressive complex 2 is responsible for mediating trimethylation of lysine 27 of histone H3 in human cancer cells. In contrast, there is little overlap between H3me3K9 targets and components of the Polycomb repressive complex 2, suggesting that a different histone methyltransferase is responsible for the H3me3K9 modification. Previous studies have shown that SETDB1 can trimethylate H3 on lysine 9, using in vitro or artificial tethering assays. SETDB1 is thought to be recruited to chromatin by complexes containing the KAP1 corepressor. To determine if a KAP1-containing complex mediates trimethylation of the identified H3me3K9 targets, we performed ChIP-chip assays and identified KAP1 target genes using human 5-kb promoter arrays. We found that a large number of genes of ZNF transcription factors were bound by both KAP1 and H3me3K9 in normal and cancer cells. To expand our studies of KAP1, we next performed a complete genomic analysis of KAP1 binding using a 38-array tiling set, identifying ~7,000 KAP1 binding sites. The identified KAP1 targets were highly enriched for C2H2 ZNFs, especially those containing Krüppel-associated box (KRAB) domains. Interestingly, although most KAP1 binding sites were within core promoter regions, the binding sites near ZNF genes were greatly enriched within transcribed regions of the target genes. Because KAP1 is recruited to the DNA via interaction with KRAB-ZNF proteins, we suggest that expression of KRAB-ZNF genes may be controlled via an auto-regulatory mechanism involving KAP1.
Folding free-energy landscape of villin headpiece subdomain from molecular dynamics simulations. PNAS, in-press (published online)
Lei et al studied the folding process of a protein called villin headpiece (HP35) using molecular dynamics simulation and achieved high accuracy ab initio folding to as close as 0.46 Å. The achievement marks the first time that ab initio simulations can reach this level. The simulation demonstrated a comprehensive picture on the kinetics and thermodynamics of HP35 folding.
Eisen JA et al. "Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote." PLoS Biol. 2006 Aug 29;4(9).
In the September issue of PLoS Biology, Jonathan Eisen and colleagues report on the sequencing and
analysis of the macronuclear genome of the ciliate Tetrahymena thermophila. This species is a model for
studies of the functioning of eukaryotic cells (e.g., telomerase, the enzyme that copies the ends of linear
chromosomes, which has been implicated in human aging was discovered in this species). Analysis of the
genome has provided insights into the biology and evolution of this species and of eukaryotes in general.
In addition, analysis reveals the presence of many pathways shared between Tetrahymena and humans but
which are absent from other model species such as yeast.
Pollard KS, SR Salama et al. "An RNA gene expressed during cortical development evolved rapidly in humans." Nature. 2006 Aug 16.
Pollard and colleagues scanned the human genome for DNA sequences that have been nearly frozen throughout vertebrate evolution but changed rapidly in the human lineage since the chimp-human ancestor. The most dramatic such Human accelerated Region (HAR) is part of a novel RNA gene expressed specifically in Cajal-Retzius neurons in the developing human neocortex from 7 to 19 gestation weeks, a critical period for cortical neuron specification and migration.
Segal DS et al. "Structure of Aart, a Designed Six-Finger Zinc Finger Peptide, bound to DNA." J. Mol. Biol. Aug 2006. doi:10.1016/j.jmb.2006.08.016.
The Segal Lab, in collaboration with crystallographer Nancy Horton at U. Arizona, present the first crystal structure of an engineered, 6-zinc finger DNA-binding protein bound to DNA. Zinc fingers are one of the most common types of DNA-binding domains found in nature. In the past several years, protein engineering efforts have been successful in reprogramming their binding specificity, so that now a custom DNA-binding protein can be made to bind almost any desired DNA sequence. In theory, a protein containing 6 zinc finger domains should be able to recognize 18-bp, giving it the capacity to bind a single unique site in the human genome. This study examines how close that theory fits with reality. The protein, Aart, is also unusual in that it recognizes an A-rich binding sequence, and does so with unusually high affinity (7 pM).
Dongying Wu, Sean C. Daugherty, Susan E. Van Aken, Grace H. Pai, Kisha L. Watkins, Hoda Khouri, Luke J. Tallon, Jennifer M. Zaborsky, Helen E. Dunbar, Phat L. Tran, Nancy A. Moran, Jonathan A. Eisen. “Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters.” PLOS Biology. In Press.
In the June issue of PLoS Biology Jonathan Eisen and colleagues report on the sequencing and analysis of the genomes of bacterial symbionts that live inside the glassy-winged sharpshooter. The sharpshooter is a major agricultural pest and of great concern in California as it is the vector that transmits Pierce's disease in grapes. The study reveals that the sharpshooter uses the bacteria to make nutrients (e.g., vitamins and amino-acids) that it does not get from its diet of grape sap. Thus the bacteria may serve as a weakness that could allow better mechanisms to control the spread of this insect. In addition, of local interest, one of the symbionts (Baumannia cicadellinicola) was named after Paul Baumann who recently retired from U. C. Davis.
Mark Bieda, Xiaoqin Xu, Michael A. Singer, Roland Green, and Peggy J. Farnham. "Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome." Genome Res. 2006 16: 595-605. Published in Advance April 10, 2006, 10.1101/gr.4887606.
Bieda et al. used the technique of ChIP-chip (chromatin immunoprecipitation followed by DNA microarray analysis) to examine binding of the E2F1 transcription factor throughout 1% of the human genome using high-density oligonucleotide tiling arrays. Their results suggest that E2F1 is recruited to promoters via a method distinct from direct recognition of the known consensus E2F site and point toward a new understanding of E2F1 as a factor that contributes to the regulation of a large fraction of the set of human genes.
Sharon L. Squazzo, Henriette O’Geen, Vitalina M. Komashko, Sheryl R. Krig, Victor X. Jin, Sung Jang, Raphael Margueron, Danny Reinberg, Roland Green and Peggy J. Farnham. “Suz12 binds to silenced regions of the genome in a cell-type-specific manner.” Genome Res. published online Jun 2, 2006.
In this manuscript, Squazzo and colleagues use the technique of genome-wide ChIP-chip to identify thousands of promoters that are silenced by Polycomb Group Repression Complexes (PRCs). The components of this complex are upregulated in a large number of human tumors. Interestingly, they find that diferent regions of the human and mouse genomes are silenced in adult vs. embryonal tumors.
McHale, L., Tan, X., Koehl, P., Michelmore, R.W. (2006). Plant NBS-LRR proteins: adaptable guards. Genome Biology 7:212.
Most of the disease resistance genes in plants cloned to date encode nucleotide-binding site leucine-rich repeat proteins characterized by nucleotide-binding site and leucine-rich repeat domains as well as variable amino- and carboxy-terminal domains. These large, abundant proteins are involved in the detection of diverse pathogens, including bacteria, viruses, fungi, nematodes, insects and oomycetes. Their precise mode of action is incompletely understood; they may monitor (guard) the status of targets of pathogen effectors rather than detecting pathogen-derived ligands directly. This review provides a current overview of the structure and function of this protein family and highlights recent advances.
West, M. A.L., van Leeuwen, H., Kozik, A., Kliebenstein, D. J., Doerge, R. W., St.Clair, D. A., Michelmore, R.W. (2006). High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Research 16: 702-712. Two types of genetic markers were developed from Affymetrix GeneChip microarrays to simultaneously generate both phenotypic (gene expression) and genotypic (marker) data. Gene expression markers (GEMs) are based on differences in transcript levels that exhibit bimodal distributions in segregating progeny. Single feature polymorphism (SFP) markers rely on differences in hybridization to individual oligonucleotide probes; our method identifies SFPs independent of a gene’s expression level. Using microarrays on a population to simultaneously measure gene expression variation and obtain genotypic data for a linkage map facilitates expression QTL analyses without the need for separate genotyping. Both marker types also offer opportunities for massively parallel mapping in unsequenced and less studied species.
Home