Research
Decoding the human genome and other mammalian genomes to understand human diseases is our primary research interest, and comparative genomics and integrated analysis of functional genomics data are our major research approaches. In general, we are interested in developing and applying computational methods for exploiting patterns in high-throughput genomic and epigenomic data and for extracting biological information from experimental and computational bigdata. Since the lab was established in 2007, we have been developing bioinformatics approaches and software in studying the expression, regulation, and evolution of genes (both coding and non-coding) for human cardiac and nervous systems, especially those implicated in neuropsychiatric disorders and congenital heart diseases. We have substantial experiences in the analysis, interpretation, and integration of big experimental data from next-generation sequencing (e.g., ChIP-seq, ATAC-seq, RNA-seq and BS-seq) at bulk tissue and single cell levels, supported by > 200 preprints and publications. Using bioinformatics and in collaboration with experimental investigators, we want to understand how a single human genome is used for establishing and maintaining distinct cell lineages and how this important process is abnormally altered during disease development. Our research group focuses on multiple areas
Areas
Integrated analysis of functional genomics data: to explore computational techniques for combining experimental and computational genomics data in order to achieve a global understanding of the function of the human genome.
We are interested in developing bioinformatics algorithms to mine big genomics data and to conduct cross-genome sequence comparisons (e.g., syntenic assignment) with a focus on deciphering the gene regulatory networks underlying normal development and development disorders. As large amount of functional genomic data from diverse sources (microarray, high-throughput sequencing, protein-protein interaction, perturbation, etc.) are combined and used in our studies, our group develop and apply effective computational methods for data integration and at the same time for addressing common concerns of data quality and visualization in genome-scale experiments. We believe that a rigorous statistical framework needs to be built in order to extract biologically meaningful signals from noises and stochastic background.
Gene regulatory networks: to construct and understand gene regulatory networks in heart development and disease.
We apply genomic and bioinformatic research to study the genetic networks underlying heart development and congenital heart disorders, in collaboration with experts at Einstein and other institutions and using transcriptomics data from bulk heart tissues or single cells from mouse models. For the last few years, the work has been expanded to adult heart diseases, e.g., heart failure. In these collaborative and team projects, students and postdocs in the lab lead the bioinformatics of bulk and single cell transcriptomics and epigenomics analysis, transcriptional regulatory network construction, and pathway discovery.
Cancers and TME: to study the molecular and cellular pathways for cancer development and therapy.
Identification of genetic and epigenetic perturbations and characterization of tumor microenvironment (TME) using big data and advanced computing are critical for developing novel therapeutic strategies for cancers. We have collaborated very successfully and productively with multiple investigators at Einstein and other institutions to address the challenges in integrated analysis of big and often heterogenous multimodal genomics data from cancer models or patient derived data.
iPSC system for studying the genetic basis of psychiatric disorders: to use iPSC technology and systems genomics approaches for studying neural development and abnormal gene regulation in neuropsychiatric disorders like schizophrenia and autism spectrum disorders.
In collaboration with experimentalist experts, we grow human neurons in dish by induced pluripotent stem cell (iPSC) technology in order to model human neuronal development and differentiation. We begin by developing iPSC lines from both patients and healthy subjects, differentiate them to neural progenitors and neurons, then use RNA-seq and other deep sequencing technology to identify differentially regulate genes by comparing the transcriptomes between patient-derived neurons and controls. Using this systems biology approach, we have identified many novel long non-coding RNA genes that are involved in embryonic neurogenesis and potentially neuropsychiatric disorders. We also find that many genes show allele-biased gene expression in different brain regions, including some that have been implicated in major psychiatric disorders, which may help explain some aspects of parent-of-origin effects, twin discordance and reduced penetrance. Our studies have also continuously to uncover molecular pathways and cell-cell communications affected by critical genes that are major risk factors for schizophrenia and autisms.
Gene duplication and retro-transposition: to investigate how gene duplication and transposition have shaped the human genome and how novel genes or regulatory elements emerge in humans and other primates.
Gene expansion (through either duplication or retrotransposition) is a major driving force for the emergence of novel functions during evolution. Inter- and intra-genome sequence comparison can reveal DNA elements that are either uniquely present or specifically selected in certain species. Tracking the evolutionary history of such sequences can lead to the discovery of genes or regulatory elements (including non-coding RNAs) that function specifically in humans. Thus, the long-term goal is to search for functional genomic components that set us apart from other animals. To this end, our research will compare the human genome with other mammalian (e.g., chimp, macaque, dog, and mouse) genomes to exploit the role of gene duplication in generating novel protein-protein interactions and novel biochemical pathways. We are interested in both the birth and the death (pseudogenization or loss) of such lineage-specific functional DNA sequences, especially for those involved in the development of nervous system and brains.