THE BAYLOR HOPKINS CENTER FOR MENDELIAN GENOMICS (BHCMG)

The BHCMG is a National Human Genome Research Institute (NHGRI)/National Heart Lung and Blood (NHLBI)-funded research program designed to identify novel disease genes and variants underlying human Mendelian disease. At Baylor alone, we have enrolled over 5,000 families with over 500 different disease conditions in our study. Analysis of genetic data (primarily using whole exome sequencing) has enabled us to identify more than 200 new disease-associated genes (you can go to http://mendelian.org/phenotypes-genes) to see a list of all of the genes identified by the four Centers for Mendelian Genomics). At the same time, we have also been able to learn about new clinical features (‘expanded phenotypes’) of many genetic conditions. Keep reading to learn more about our discoveries related to several specific disease conditions.

TURKISH BRAIN MALFORMATION

Human nervous system development is a precisely orchestrated process requiring multiple complex cellular and molecular interactions. Neuronal development has been grossly divided into three main processes including neurogenesis, neuronal migration, and postmigrational cortical organization/circuit formation. Disruption at any level of this process may cause a spectrum of neurodevelopmental disorders (NDD) including developmental delay/intellectual disability (DD/ID), structural brain malformations, epilepsy and neurobehavioral traits such as autism spectrum disorders and ADHD. The prevalence of intellectual disability is ~5% resulting in significant sociocultural and economic burden to the society. With the recent advances in genome technology and computational methods, there has been tremendous increase in the identification of numerous genes related to NDD, and surprising interactions between different basic cellular/molecular pathways have been revealed. We use whole exome sequencing (WES), transcriptome profiling (RNA-seq) and whole genome sequencing (WGS) techniques to explore novel genes/pathways causing NDD phenotype.

ARTHROGRYPOSIS/NEUROMUSCULAR DISORDERS

Neuromuscular disorders (NMDs) are clinically and genetically heterogeneous group of disorders resulting from a perturbation in the neuromuscular axis from brain to muscle itself. Arthrogryposis is defined as contracture of the joints in at least two body segments resulting from multiple conditions including neurological, neuromuscular junction and muscle disorders. Identification of the genetic cause in NMDs/arthrogryposis is crucial for determining the prognosis, management of patient, counseling these families, possible early interventional options and enable patients to enroll in clinical trials. Despite using the most recent advanced genetic technology including whole exome sequencing (WES), array CGH and invasive tests such as muscle biopsy, the diagnostic yield ranges between 25-60%. Our laboratory is implementing WES, RNA-seq and WGS to explore the genetic etiology in NMD and arthrogryposis cohort.

  • UNCOVERING GENETIC HETEROGENEITY IN ROBINOW SYNDROME

Robinow syndrome, autosomal dominant (DRS) is a genetic heterogeneous skeletal dysplasia characterized by mesomelic limb shortening that results from malfunctioning of Wnt non-canonical components. In humans, several skeletal dysplasias as well as disorders of bone mass, result from perturbation of Wnt signaling. For more than 40 years DRS remained molecularly undiagnosed in the vast majority of subjects, but we have recently discovered that two paralogous genes, DVL1 and DVL3 contribute to at least 30% of the cases. Those proteins act in the WNT signal transduction and mediate the crosstalk between canonical and non-canonical WNT signaling pathways. The mechanism by which they cause DRS is remarkable: Robinow-associated variants in both DVL1 and DVL3 affect specifically the penultimate or last exon of either gene, consistently leading to -1 frameshifting with a premature stop codon at the last exon. Both transcripts escape nonsense mediated decay. The resulting truncating proteins likely act in a dominant-negative or gain-of-function manner. Functional studies for DVL1 and DVL3 are currently ongoing in addition to sample collection of new Robinow families without a molecular diagnosis aiming to find novel genes and variants that can cause DRS.

  1. White JJ, et al. (2016) DVL3 Alleles Resulting in a -1 Frameshift of the Last Exon Mediate Autosomal-Dominant Robinow Syndrome. Am J Hum Genet. PubMed 26924530
  2. White J, et al. (2015) DVL1 frameshift mutations clustering in the penultimate exon cause autosomal-dominant Robinow syndrome. Am J Hum Genet. PubMed 25817016

Exploring the genetic etiology and mechanism of vertebral malformation (manifested as congenital scoliosis) with next generation sequencing

Vertebral Malformation (VM) is a congenital deformity of the spine with an incidence of 0.5-1‰ [1, 2], which can manifest as congenital scoliosis (CS) clinically. As a continuation of the previous study, we plan to perform Whole Exome Sequencing (WES) a large cohort of CS patients of Han Chinese ethnicity. By exploring a wide spectrum of variants throughout the human genome, we aim to identify rare pathogenic variants with high penetrance to CS so as to advance our understanding of the genetic basis of VM.

  • TOOTH AGENESIS

Tooth agenesis (TA) is a common craniofacial anomaly with esthetic and functional consequences. More than 300 syndromes have been reported in association with tooth agenesis. Nevertheless, the majority of the cases representing nonsyndromic forms, found as familial or sporadic traits, and with considerable phenotypic heterogeneity. In familial cases, autosomal dominant inheritance is frequently observed, while autosomal recessive and X-linked inheritance have also been reported. TA can be classified based on the number of missing teeth as hypodontia (less than 6 teeth missing) or oligodontia (six or more teeth missing). Hypodontia is common, with a prevalence of 3% to 10% depending on the population, whereas oligodontia is rare, with a prevalence of less than 1%. Despite comprehensive efforts to define genetic susceptibility, the signaling pathways and causative genes associated with human TA have not been fully elucidated. Here in Lupski lab, we applied whole-exome sequencing to identify novel putatively causal variants in known/novel genes for TA in multiplex families.

  • PRIMARY IMMUNODEFICIENCY DISEASES (PIDD)

Primary immunodeficiency diseases are inheritable conditions characterized by deficient or dysregulated immune function. Over 300 distinct primary immunodeficiency diseases have been recognized by the International Union of Immunological Societies. As a collaborative effort with the Texas Children’s Hospital Center for Human Immunobiology, the Center for Mendelian Genomics has focused upon identifying novel causes of PIDDs through a combination of genetic analyses and biological validation testing. This project remains ongoing and has yielded valuable and important contributions to the fields of both genetics and immunology:

  1. Stray-Pedersen A, et al. (2014) PGM3 mutations cause a congenital disorder of glycosylation with severe immunodeficiency and skeletal dysplasia. Am J Hum Genet. PubMed 24931394
  2. Bayer DK, et al. (2014) Vaccine-associated varicella and rubella infections in severe combined immunodeficiency with isolated CD4 lymphocytopenia and mutations in IL7R detected by tandem whole exome sequencing and chromosomal microarray. Clin Exp Immunol. PubMed 25046553
  3. Stray-Pedersen A, et al. (2014) Compound heterozygous CORO1A mutations in siblings with a mucocutaneous-immunodeficiency syndrome of epidermodysplasia verruciformis-HPV, molluscum contagiosum and granulomatous tuberculoid leprosy. J Clin Immunol. PubMed 25073507
  4. Watkin LB, et al. (2015) COPA mutations impair ER-Golgi transport and cause hereditary autoimmune-mediated lung disease and arthritis. Nat Genet. PubMed 25894502
  5. Lalani SR, et al. (2016) Recurrent Muscle Weakness with Rhabdomyolysis, Metabolic Crises, and Cardiac Arrhythmia Due to Bi-allelic TANGO2 Mutations. Am J Hum Genet. PubMed 26805781
  6. Yu H, et al. (2016) Rapid molecular diagnostics of severe primary immunodeficiency determined by using targeted next-generation sequencing. J Allergy Clin Immunol. PubMed 27484032

COMPUTATIONAL TOOLS

  • HMZDELFINDER

We developed HMZDelFinder, a computational tool, for detection of homozygous and hemizygous rare and intragenic CNVs from WES data. We applied this algorithm to 4866 samples in the Baylor–Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% and precision of 78%. Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/ or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Out of 64 validated calls, 15 were found out to be pathogenic CNVs. The source code for this tool is available online: https://github.com/BCM-Lupskilab/HMZDelFinder and published in Nucleic Acids Research https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389578/

  • DNM-FINDER

We developed an in-house software called DNM (de novo mutation)-Finder s. The source code for this tool is available online: https://github.com/BCM-Lupskilab/DNM-Finder to identify de novo variants from WES data. In this algorithm, parental variants were subtracted in silico from the proband's variants in vcf files, while incorporating read number information extracted from BAM files. Filtering was then implemented using the following criteria: 1) an alternative variant read count greater than 5 in the proband. 2) ratio of alternative variant read caount to reference variant read count greater than 30% in the proband, 3) reference variant read count greater than 10 in both parents, 4) ratio of alternative variant read count to reference variant read count less than 5% in both parents.

  • UPD-FINDER

To detect potential heterodisomic and isodisomic UPD regions in Baylor-CMG data, we will utilize ~800 trio whole exome sequencing (WES) dataavailable in our database consisting of ~7000 WES data in total. Next, we will scan all of the SNVs in proband and parents’ WES data in each trio and extract all of the informative SNVs defined as AA (Parent1) and BB(Parent2) to detect potential UPD stretches. Alleles that do not segregate in proband (AA or BB) will be flagged as the starting point to detect potential AOH UPD regions and considered as 100% informative SNVs (starting point). Using the sliding-window approach, we will check the next SNV for 100% informative SNVs or 50% informative SNVs in parents until we find a 100% informative SNV that segregates correctly. Then we will merge the adjacent UPD stretches by circular binary segmentation (CBS) algorithm. This tool will be available online soon.

  • AAMR-PREDICTOR

Alu elements, the short interspersed element numbering greater than 1 million copies per human genome, are often found at the breakpoint junctions of genomic rearrangements that can cause disease. To identify the impact of Alu/Alu-mediated rearrangement (AAMR) on genomic variation and human health, we performed a bioinformatic analysis to characterize CNV-Alus that are involved in mediating CNVs and predicted genome-wide AAMR hotspot loci.

AAMR-Predictor was developed to query a gene or a pair of genomic regions for the predicted AAMR hotspots, which is publicly available at http://alualucnvpredictor.research.bcm.edu:3838/

  • PhenoMatcher

To assist an accurate and comprehensive analysis of exome data, we developed an interactive and publicly available web tool, PhenoMatcher (http://genomicanalysis.research.bcm.edu:3838/PhenoMatcher/). PhenoMatcher can take a set of human phenotype ontology (HPO) terms that can describe the phenotypes of your patient as inputs and return phenotypic similarity scores for each disease gene recorded in the Online Mendelian Inheritance in Man (OMIM) database. If a gene has been associated with multiple diseases, the maximum score will be reported as ‘PhenoMatcher_score_max’ and all the other scores will be summarized in a column of ‘scores’ with respect to each disease listed in ‘dz_ID_all’. The output table is searchable and can be downloaded directly to a local device.

STRUCTURAL VARIATION MUTAGENESIS

Lupski laboratory contributions have significantly increased our understanding of mutational mechanisms and human disease. These contributions span basic DNA chemistry to ground-breaking applications of genomic technologies. Recent studies focus on genomic instability and rearrangements, disease-related mutations and CNV, replication and mutagenic mechanisms, template switching, Alu-Alu recombination and the technological advances in whole genome and exome sequence analysis of Mendelian disease. The potential complexity of structural variants (SV) was not envisioned prior to work by the Lupski lab; thus, the frequency of complex genomic rearrangements (CGR) and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. Our research has led to the concept of genomic disorders, which refers to syndromes that originate not from point mutations that affect the function of specific genes, but from duplications, deletions and other structural variants involving different genomic regions.

  1. Lupski JR, et al. (1991) DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell. PubMed 1677316
  2. Chen KS, et al. (1997) Homologous recombination of a flanking repeat gene cluster is a mechanism for a common contiguous gene deletion syndrome. Nat Genet. PubMed 9326934
  3. Lee JA, Carvalho CM and Lupski JR. (2007) A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell. PubMed 18160035
  4. Mayle R, et al. (2015) DNA REPAIR. Mus81 and converging forks limit the mutagenicity of replication fork breakage. Science. PubMed 26273056
  • COMPLEX GENOMIC REARRANGEMENTS (CGR)

Our laboratory demonstrated that CNV could be very complex when mediated by template switches between inverted repeats leading to a rearrangement end-product structure revealed by array CGH consisting of a duplication followed by a triplicated segment that was inverted and embedded within the duplication (DUP-TRP/INV-DUP). Furthermore, extreme mutational complexity could be observed in a genome that resulted in a very complex chromosomal structure. This phenomenon was first observed in cancer genomes and referred to as chromothripsis. My laboratory identified the phenomenon in constitutional genomes of the individuals undergoing clinical array testing because of delays in neurologic development. The presence of triplicated segments as well as evidence for extensive template switching and templated insertions at breakpoint junctions stitched together by microhomology was consistent with a replicative mechanism; we referred to the observed phenomenon as chromoanasynthesis. Further complexities to mutagenesis were found by careful study of breakpoint junctions which demonstrated a mutational signature of the replicative mechanism whereby point mutations (SNV) were induced at a rate 1,000 fold greater than that observed with intergenerational DNA polymerases. In the complex rearrangements one could observe evidence for template switching between chromosome homologues versus sister chromatids leading to large segments of absence of heterozygosity (AOH). Thus, structural variation could be accompanied by both point mutagenesis as well as template switching which could distort Mendelian expectations since it resulted in a region of uniparental disomy.

  1. Carvalho CM, et al. (2011) Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat Genet. PubMed 21964572
  2. Liu P, et al. (2011) Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell. PubMed 21925314
  3. Carvalho CM, et al. (2013) Replicative mechanisms for CNV formation are error prone. Nat Genet. PubMed 24056715
  4. Liu P, et al. (2017) An Organismal CNV Mutator Phenotype Restricted to Early Human Development. Cell.PubMed 28235197
  • MARKER CHROMOSOMES

One of the projects involve the molecular characterization and functional assessment of marker chromosomes. Small supernumerary marker chromosomes (sSMC) are chromosomal fragments that are found in addition to the normal complement of 46 human chromosomes. Their overall size can vary greatly but generally any abnormal fragment detected smaller than chromosome 20 is thought to be considered a marker. This small size also makes it difficult to meaningfully characterize from a genomic standpoint, complicating clinical decisions when they are detected. Using a combination of genomic techniques our lab is focusing to define marker chromosomes with nucleotide-level resolution to better understand not only mechanisms leading to their formation but also their true molecular structure.

  • PELIZAEUS–MERZBACHER DISEASE (PMD)

Pelizaeus–Merzbacher disease (PMD) is a rare X-linked dysmyelinating leukodystrophy of the central nervous system. PMD manifests with a wide spectrum of clinical severity caused by different classes of mutations affecting the dosage-sensitive gene, PLP1. PLP1 copy-number gain is the predominant cause of PMD; however, cataloging each of the observed patterns of complex structural variations (SVs) at this locus remains to be performed. We have molecularly studied 148 PMD samples with non-recurrent SVs involving PLP1; 138 were of copy-number gains and ten were of copy-number losses. Herein we characterize the novel PLP1 SVs (61/138) and compare these data to our previous findings. Consistently, many of the novel copy-number gains have one or more breakpoints overlapping critical low-copy repeats (LCRs) in the region, termed PMD-LCRs. Additionally, we find that 56% of all described copy-number gain events appear as simple duplications, whereas 44% present as complex genomic rearrangements (CGRs). Duplication followed by triplication and subsequent duplication, DUP-TRP-DUP, is the predominantly observed pattern (43%) of all examined copy-number gain CGR cases. Another commonly observed CGR pattern is interspersed duplication separated by a copy-neutral segment, DUP-NML-DUP (33%). Importantly, we report additional samples of known but less frequently observed patterns. Our analyses further support the contention that PMD patients predominantly harbor copy number gains, many with complexities and LCR involvement. These data indicate that replicative DNA repair processes are often involved in PMD rearrangements. Furthermore, our study has shown that deletions of PLP1 range from 6.7 kb to 5.6 Mb in size and in contrast to gain events, deletions of PLP1 were rarely associated with complexities; wherein 90% appear as simple events. Sequence-mapped breakpoints of deletions overlap with distinctive genomic architecture in each of the studied cases including Alu elements, LINEs, as well as LCRs. Finally, we investigated potential molecular mechanisms for the penetrance and variable phenotypic expressivity in deletion carrier females by exploring X-inactivation patterns as well as possible bi-allelic variation in the PLP1 region. This study expands our knowledge of CGR patterns at a discrete locus, their mechanisms of formation, and the molecular pathogenicity of PMD.

OTHER COLLABORATIVE PROJECTS

  • GOLDENHAR SYNDROME

Also called: Hemifacial Microsomia, Oculoauriculovertebral Spectrum Dysplasia, or Facioauriculovertebral Sequence

Goldenhar Syndrome is a common birth defect affecting the differentiation of the first and second branchial arches and their derivatives. While widely variable, the usually involved tissues are the external and middle ear structures, the mandible and adnexa, and other facial tissues on the same side (orbit, ear tags and cartilaginous parts, eyelid and eye, nose, neck). In some cases, the cervical and thoracic vertebrae and even cardiac anomalies are described. Most cases are isolated, although rare dominant transmission has been suggested. A plethora of purported genes and chromosomal deletions and rearrangements have reported with few consistent explanations, making this cluster and spectrum of phenotypes logical targets for both availability of collection and tactical approaches to analyze and solve. Thus far, we have collected XXX cases and YYY trios for (Trio) Exome/Genome analysis.

  • HALLERMANN-STREIFF SYNDROME

Hallermann-Streiff Syndrome (HSS) is an extremely rare (estimated 1 per 1,125,000 live births) but clinically highly homogenous ectodermal dysplasia characterized by brachycephaly, frontal bossing, thin hair, congenital cataracts, micrognathia, various dental anomalies, and short stature. Every reported case has been isolated; no unaffected parents have had a second affected child, and no affected individual has reproduced an affected offspring. Despite its eponymic description in 1948 and 1950, every historical attempt by then conventional genetic approaches has failed to identify any reasonable candidate explanation. We have recently enrolled a dozen families with HSS, five from Turkey and seven from the United States, for Trio Exome/Genome analysis to solve this fascinating mystery.