Journal list menu

Volume 33, Issue 11 p. 3254-3265
Embryonic Stem Cells/Induced Pluripotent Stem Cells
Free Access

Genome-Wide Identification of MESP1 Targets Demonstrates Primary Regulation Over Mesendoderm Gene Activity

Benjamin Soibam,

Benjamin Soibam

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA

Search for more papers by this author
Ashley Benham,

Ashley Benham

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Search for more papers by this author
Jong Kim,

Jong Kim

Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA

Search for more papers by this author
Kuo-Chan Weng,

Kuo-Chan Weng

The Institute of Biosciences and Technology, Texas A & M University Health Science Center, Houston, Texas, USA

Search for more papers by this author
Litao Yang,

Litao Yang

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Search for more papers by this author
Xueping Xu,

Xueping Xu

Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA

Search for more papers by this author
Matthew Robertson,

Matthew Robertson

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Search for more papers by this author
Alon Azares,

Alon Azares

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Search for more papers by this author
Austin J. Cooney,

Austin J. Cooney

Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA

Search for more papers by this author
Robert J. Schwartz,

Corresponding Author

Robert J. Schwartz

Texas Heart Institute, Texas Medical Center, Houston, Texas, USA

Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA

Correspondence: Yu Liu, Ph.D., Department of Biology and Biochemistry, University of Houston, Houston, Texas 77004, USA. Telephone: 713-743-8173; e-mail: yliu54@uh.edu; or Robert J. Schwartz, Ph.D., Department of Biology and Biochemistry, University of Houston, Houston, Texas 77004, USA. Telephone: 713-743-6595; e-mail: rjschwartz@uh.eduSearch for more papers by this author
Yu Liu,

Corresponding Author

Yu Liu

Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA

Correspondence: Yu Liu, Ph.D., Department of Biology and Biochemistry, University of Houston, Houston, Texas 77004, USA. Telephone: 713-743-8173; e-mail: yliu54@uh.edu; or Robert J. Schwartz, Ph.D., Department of Biology and Biochemistry, University of Houston, Houston, Texas 77004, USA. Telephone: 713-743-6595; e-mail: rjschwartz@uh.eduSearch for more papers by this author
First published: 21 July 2015
Citations: 20

Abstract

MESP1 is considered the first sign of the nascent cardiac mesoderm and plays a critical role in the appearance of cardiac progenitors, while exhibiting a transient expression in the developing embryo. We profiled the transcriptome of a pure population of differentiating MESP1-marked cells and found that they chiefly contribute to the mesendoderm lineage. High-throughput sequencing of endogenous MESP1-bound DNA revealed that MESP1 preferentially binds to two variants of E-box sequences and activates critical mesendoderm modulators, including Eomes, Gata4, Wnt5a, Wnt5b, Mixl1, T, Gsc, and Wnt3. These mesendoderm markers were enriched in the MESP1 marked population before the appearance of cardiac progenitors and myocytes. Further, MESP1-binding is globally associated with H3K27 acetylation, supporting a novel pivotal role of it in regulating target gene epigenetics. Therefore, MESP1, the pioneer cardiac factor, primarily directs the appearance of mesendoderm, the intermediary of the earliest progenitors of mesoderm and endoderm organogenesis. Stem Cells 2015;33:3254–3265

Significance Statement

MESP1 is long known to be the master regulator of the cardiovascular system. However, recent evidence suggests that its role in embryogenesis may be broader than currently recognized. We used unbiased whole-genome approaches to identify MESP1 targets and found that MESP1 chiefly activates genes pertinent to the mesendoderm program, thus initiates a potent auto-regulatory program to direct the earliest cardiac and endoderm progenitors. MESP1 may also recruit histone acetylation to target genes to establish a poised status, ready for expression during mesendoderm-derived organogenesis. This work provides important knowledge for guiding critical organ (heart, blood, gut, et al) regeneration.

Introduction

MESP1, a basic helix-loop-helix transcription factor, is long known for its role in cardiovascular development. Recent evidence indicates that MESP1's role in development is broader than currently recognized. It is the first sign of the nascent cardiac mesoderm, but it also marks the appearance of hematopoietic stem cells, head skeletal mesoderm, and endoderm derived foregut 1, 2. Ablation of both Mesp1 and Mesp2 tandem genes led to the absence of the heart, also accompanied by loss of most anterior structures 3. Identifying the early cell lineages regulated by MESP1 is important for regeneration of the related critical tissues (heart, blood, gut, etc). This requires identification of MESP1 direct targets in a specific and unbiased fashion.

MESP1 interacts with genomic DNA via E-box elements and regulates downstream gene expression 4. Previously, MESP1 gene targets were identified by a candidate gene approach involving induced overexpression of tagged MESP1 in murine embryonic stem cells followed by microarray analysis of the entire embryonic stem cell (ESC) population transcriptome. From this data, a few candidate genes were identified and MESP1 chromatin immunoprecipitated (ChIP) DNA was analyzed to validate them. These studies as a first degree approximation of endogenous MESP1 signaling defined MESP1 as the most critical factor for cardiovascular development 5. Even with the advent of deep sequencing technologies, it is still challenging to identify in an unbiased manner the transcriptome and chromatome prompted by MESP1-fated cells, especially when MESP1 expression is cell type restricted to a group of early progenitor cells and temporally restricted to E6–7.5 during embryogenesis 6, 7, and accordingly transiently expressed in a small percentage (3%–5%) of differentiating ESCs 7, 8.

We have taken steps to circumvent this problem by generating a mouse Mesp1Cre/+: Rosa26EYFP/+reporter ESC line (referred to herein as UH3 cells). The endogenous Mesp1 promoter drives the expression of a knock-in Cre recombinase, which in turn activates the expression of EYFP in the Rosa26 locus. These cells are hence permanently YFP marked. This enables us to follow MESP1 progeny over time even though the endogenous Mesp1 gene becomes repressed. We used this system to study the transcriptome and chromatome of cells that were specified only by endogenous MESP1 signaling. Surprisingly, MESP1 primarily directs the appearance of mesendoderm, an early precursor cell, which gives rise to mesoderm (specifying cardiac, blood, and bone cells) and endoderm (specifying foregut endoderm and the pancreas) structures.

Materials and Methods

ESC Culture and Differentiation

Mesp1Cre/+; Rosa26EYFP/+reporter (UH3) cells were established by the conventional blastocyst outgrowth method. Ab2.2 ESCs were used as a control. Cells were cultured in knockout DMEM (GIBCO, NY, http://www.lifetechnologies.com/us/en/home/brands/gibco.html) with 15% stem cell grade fetal bovine serum (Atlanta Biologicals, GA, https://www.atlantabio.com/), 1% antibiotic/antimitotic (GIBCO, NY), 100 μM B-mercaptoethanol, 2 mM l-glutamine (GIBCO, NY), and supplemented with 1000 U/ml of leukemia inhibitory factor (LIF) to maintain pluripotency. The media was changed daily, and passaging of the cells was done every 3 days.

Before differentiating the UH3 cells were FACs sorted to remove all YFP+ cells from low level spontaneous differentiation. Ab2.2 and UH3 cells were differentiated using the hanging drop method. Briefly, 20 μl drops of LIF free ES media containing 400 cells were placed on the lid of a square petri dish. The bottom of the dish was filled with 5 ml autoclaved water and the drop-containing lid was inverted over the dish. The embryoid bodies were plated on day 5 to gelatin-coated dishes. LIF free ES media was refreshed every 3 days. For initial differentiation characterization studies the ES media was supplemented with 10 ng/ml human BMP-4 (Peprotech, NJ, https://www.peprotech.com/).

Detailed protocols related to immunostaining, karyotyping, RNA isolation, chromatin immunoprecipitation, quantitative polymerase chain reaction (qPCR), next-generation sequencing, and bioinformatics analysis are provided in Supporting Information Materials and Methods.

Results

Characterization of a Mesp1 Lineage-Tracking ESC Line

A Mesp1-lineage reporter mouse ESC line (UH3) was isolated from E3.5 blastocysts resulted from a Mesp1Cre/+ and Rosa26EYFP/EYFP crossing (Fig. 1A). The cells bore a Mesp1Cre/+; Rosa26EYFP/+ genotype, showed characteristic ESC morphology (Supporting Information Fig. S1A) and had a normal male karyotype (Supporting Information Fig. S1B). To characterize the UH3 cell line and confirm that they undergo differentiation similar to wildtype ESC lines such as Ab2.2, we aggregated the cells in hanging drops, harvested the embryoid bodies over time, and examined the gene expression using quantitative real-time PCR (qRT-PCR). As expected, we observed a transient induction of Mesp1 expression in UH3 cells similar to Ab2.2 control (Supporting Information Fig. S2; Fig. 1B, 1C). Mesp1 transcript expression coincided with the upregulation of Gata4 and Tbx5 genes, while Nkx2.5 and Mef2c expression increased about 1–2 days after Mesp1 expression had diminished, marking the induction of the cardiac progenitor cell program (Supporting Information Fig. S2). Cell surface markers appeared sequentially among which Cxcr4, a marker for mesendoderm, appeared concurrently with mesoderm markers Pdgfra and Flk1 9 (Supporting Information Fig. S2). Increased expression of Sirpa and Alcam, markers of blood and cardiac myocytes, respectively 10, followed several days later indicating that the UH3 cell line follows a multi-mesoderm lineage differentiation program 11, 12. We further confirmed that UH3 cells undergo cardiac differentiation by well-organized sarcomeres in YFP+ UH3 cells 8 days post-hanging drop (Supporting Information Fig. S1C). Finally, UH3 cells in culture exhibited rhythmic beating (Supporting Information Fig. S1D).

Details are in the caption following the image

Methodology for MESP1-lineage tracing. (A): The strategy in generating mouse Mesp1Cre/+;Rosa26EYFP/ + reporter embryonic stem cell (ESC) line. (B): The expression of Mesp1 mRNA during ESC differentiation, as assayed by realtime RT-PCR. (C): The expression of MESP1 during ESC differentiation, as determined by western blot. (D): Percentage of fluorescence-activated cell sorting-sorted YFP + MESP1 lineage cells during the course of ESC differentiation. (E): Schematic details of the methodology for MESP1-lineage tracing and whole-genome analyses. Abbreviations: EB, * * *; ESC, embryonic stem cell; FACS, fluorescence-activated cell sorting; FSC, forward scatter.

Transcriptome of MESP-Marked Progenitor Cells Signified Contribution to the Mesendoderm Lineage

Our strategy, as illustrated by the schematic diagram, was to follow the transcriptome and chromatome of MESP1-fated cells (Fig. 1E). Briefly, UH3 cells were differentiated by hanging drops, then staged YFP+ cells were isolated to follow Mesp1 progeny. Percentages of MESP1 YFP+ cells at each stage are shown in Figure 1D. EYFP signal started to appear as early as day 3 (Fig. 1B), while Mesp1 transcripts peaked at day 4 (Fig. 1C). MESP1 protein was enriched at day 5, as shown by western blot (Fig. 1C). To identify the transcriptome, RNA was isolated from equal numbers of undifferentiated ESCs and sorted MESP1-YFP+ cells at days 5, 6, 7, and 8 for next-generation sequencing analysis (Fig. 2A). Using mouse refseq genes as the reference annotation (mm9 version), gene expression profiles (Supporting Information Table S1) were obtained from RNA-Seq data using tophat 13, and cufflinks 14.

Details are in the caption following the image

Transcriptome of MESP-marked progenitor cells signified contribution to the mesendoderm lineage. (A): Transcriptome profiling of MESP1 YFP + cells using RNA-Seq. (B): Enriched gene ontology (GO) developmental ontology terms associated with upregulated genes. Comparisons were made at multiple time points (days 5, 6, 7, and 8) between YFP + and undifferentiated embryonic stem cells. (C): Correlation of H3K4me3 signal with mRNA expression levels. The H3K4me3 signal for a gene was computed as the normalized read count in a 2-kb interval centered at the transcription start site of the gene. (D): Dynamic expression profiles of Mesp1, Foxa2, Nkx2-5, and Eomes in the form of mapped RNA-Seq reads and H3K4me3 ChIP-Seq reads to mouse genome. The scales between the RNA-Seq alignment profiles across different days were adjusted for visual comparison. Similar adjustment was made between embryonic stem cell and day 5 for H3K4me3 profiles. (E): Hierarchical clustering of genes associated with GO terms—mesoderm development, endoderm development, and heart development. The robust z-score for each gene across the samples is reported. The cosine similarity was used as the distance metric. Detailed heat map at individual gene level is accessible in Supporting Information Figure S4. (F): GO terms for differentially expressed genes between YFP + and YFP cells at days 5, 6, 7, and 8. Abbreviation: ESC, embryonic stem cell.

To understand the temporal expression patterns of genes in MESP1-marked progenitor cells, we first identified genes that were upregulated (p < 0.05) at day 5 MESP1-YFP+ cells as compared to ESCs using DESeq 15 and performed gene ontology (GO) analyses for this gene set (Supporting Information Table S2, Fig. 2B). The highest significance was representatives of mesoderm GO terms such as heart development (89 gene count with p value of 3.1 × 10−27), skeletal system development (91 gene count, p value of 5.09E-20), and vasculature development (80 gene count with p value of 9.41 × 10−14), and followed by neuron development (58 gene count with p value of 4.20 × 10−10). Currently, there is not a well-defined mesendoderm GO term, but many endoderm GO terms such as respiratory system development (36 gene count, p value of 3.33 × 10−7), pancreas development (14 gene count, p value of 1.57 × 10−4), gut development/morphogenesis (17 gene count, p value of 2.7 × 21 × 10−6), and liver development (14 gene count, p value of 8.27 × 21 × 10−4) were significantly enriched. Interestingly, upregulated genes at day 6, 7, and 8 MESP1-YFP+ cells, also showed highest enrichment both for mesoderm and endoderm associated GO terms (Fig. 2B). We also noticed a general increase in the number of upregulated genes pertaining to developmental GO terms (Supporting Information Table S2) with the progression of differentiation. For instance, the number of heart development associated genes at day 5, 6, 7, and 8 were 89, 85, 110, and 115, respectively. The highly enriched endoderm representative term—respiratory tube development had 32, 34, 43, and 44 upregulated genes in day 5, 6, 7, and 8 YFP+ cells.

To further validate the enriched transcriptome in the YFP+ cells, we asked whether H3K4me3 modification patterns correlated with the transcriptome quantification. High quality H3K4me3 ChIP-Seq was performed for ES and day 5 YFP+ cells (Supporting Information Fig. S3A, S3B). Next, broad peaks were identified using MACS2 (https://github.com/taoliu/MACS/, an updated version of study in ref. 16) with an FDR < 0.01. H3K4me3 peaks were preferably located near transcription start sites of genes (Supporting Information Fig. S3C, S3D, Table S3). Genes with H3K4me3 peaks in their promoters had significantly higher expression levels (more than fivefold mean expression) than those without H3K4me3 peaks at their promoters (Fig. 2C). This correlation between the H3K4me3 ChIP peaks and RNA-Seq expression profiles validates the set of actively transcribed genes from our RNA-Seq data, agreeing with known association of active promoters with H3K4me3 modifications. As representative genes, Mesp1 itself was transitorily expressed on day 5 (Fig. 2D), agreeing with RT-PCR findings (Fig. 1C). Foxa2, a classic marker of endoderm, was expressed at higher levels in YFP+ marked cells at day 6, after the fall of Mesp1 mRNA (Fig. 2D). Nkx2-5, a classic cardiac marker, was enriched later in YFP+ cell at the final time points (Fig. 2D), while expression of Eomes, a mesendoderm marker, correlated to that of Mesp1 (Fig. 2D). Hierarchical clustering of genes associated with heart development, mesoderm development, and endoderm development GO terms revealed subsets of genes that were specific for differentiation stages (Fig. 2E; Supporting Information Fig. S4). Next, to determine whether MESP1 YFP+ cells and YFP cells showed different expression signatures for mesendoderm lineage, we identified differentially expressed genes using DESeq (p < 0.05). Genes upregulated in YFP+ cells were consistently enriched in heart development related terms at day 5, 6, 7, and 8 (Fig. 2F; Supporting Information Table S4). Differentially expressed genes (at day 5, 6, 7, and 8) between YFP+ and YFP cells showed stronger enrichment of mesoderm and endoderm lineage terms in YFP+ cells (Fig. 2F; Supporting Information Table S4). Thus, our data indicate that the MESP1-marked cells express a unique molecular signature, and are destined to become cardiac progenitors and other subsequent mesendoderm-derived lineages after the transient appearance of Mesp1.

Identification of MESP1 Genomic Binding Sites by ChIP-Sequencing

To identify MESP1-binding sites, we performed MESP1-ChIP on day 4 differentiating ESCs and used next generation sequencing technology to obtain the enriched DNA fragments in the form of raw reads of 40 nt from a single end. These raw reads were aligned to the mouse genome allowing a maximum of 2 mismatches per read. The alignment profile of MESP1 ChIP-Seq data passed data quality criteria by ENCODE 17 (Fig. 3A, 19 million uniquely mapped reads, relative strand correlation (RSC) = 1.27, Qtag = 1, normalized strand correlation [NSC] = 1.18). We also generated an appropriate input DNA from the same cells at day 4. We applied widely used MACS2 to the MESP1 ChIP-Seq data and the control background to obtain 43,346 peaks (Supporting Information Table S5) with a stringent p value cutoff of 10−8.

Details are in the caption following the image

Genome-wide characterization of MESP1 binding sites. (A): Quality control of MESP1 ChIP-Seq data using the ENCODE standard. Cross-correlation curve peaks approximately at the fragment length 120 bp and not at the read length (40 bp). Other ChIP-Seq quality metrics such as normalized strand coefficient (NSC), relative strand correlation (RSC) were above the standard suggested by ENCODE (according to ENCODE, minimum NSC, and RSC values should be 1.05 and 0.8, respectively). (B): MESP1 peaks annotated as intergenic, promoter (within ± 1 kb of transcription start site), Exon, intron, and TTS. (C): Histogram of number of MESP1 peaks with respect to the nearest transcription start site. MESP1 peaks were found in the promoter as well as in distal intragenic or intergenic regions. (D): The top 4 variants of E-box motifs enriched in MESP1 peaks compared with background (p < 10–16). The statistical significant difference was computed using a logistic regression of E-box occupancy and adjusted GC content in the MESP1 peaks (Supporting Information Materials and Methods) compared with background sequences. The number of MESP1 peaks and the background sequences containing the E-box variant along with motif enrichment scores are also indicated. (E): Distribution of top 4 variants of E-boxes along MESP1 peaks. The E-boxes are clustered around the peak summits. (F): Top Enriched motifs in MESP1 peaks obtained using two different methods (HOMER and FIMO). Both HOMER and FIMO predicted the same variants (“GC” and “CG”) as the two top motifs in MESP1 peaks, which matches the result in (D). Abbreviations: NSC, normalized strand correlation; RSC, relative strand correlation; TSS, transcription start site.

Genomic annotation of the peaks revealed that about 25% of MESP1 peaks resided in promoter (±1 kb of nearest transcription start site [TSS]), 26% within the introns, 28% in Exons, and 19% in intergenic regions (Fig. 3B); further analysis of the positional distribution of peaks from the nearest transcription start site (TSS) showed that majority of the promoter peaks (6,448 out of 9,444) were located within 500 bp from the nearest TSS (Fig. 3B). Interestingly, 50% of peaks were located at intergenic or intragenic distal enhancers ( > 10 kb from TSS).

MESP1-Bound Sequence Characteristics

MESP1 directly binds gene regulatory regions containing basic helix-loop-helix binding sites or E-boxes 3, 5. To determine the enriched sequence motifs, we first examined E-box sequences in MESP-bound peaks. A total of 78% peaks contained at least one canonical E-box (CANNTG). Using a logistic regression model for E-box occupancy and adjusted GC content in the peaks (Supporting Information Materials and Methods), we found that 12 of 16 variants of E-box sequences had significant enrichment (p < 1E-06) in MESP1 bound peaks compared with the random background set (Supporting Information Fig. S5). The 6 highest scoring variants with at least 10% occupancy are shown in Figure 3D and all these 6 variants passed ENCODE's standard of motif occupancy of at least 10% of peaks. Strongest preference was observed for E-box variant “GC” followed by “CG” (Fig. 3D). Overall, the variant CASSTG (S stands for C or G) was preferred over the rest of the other variants; 26%, 16%, 22%, and 22% of peaks have the CAGCTG, CACGTG, CACGTC, and CACCTG motif, respectively. The distribution of these four variants on the MESP1 peaks were also clustered at the vicinity of the peak summits (Fig. 3E). To supplement the E-box searches, we used HOMER 18 and FIMO 19 to scan for presence of known JASPAR 20 binding motifs in MESP1 bound regions. Interestingly, the top enriched motifs by both HOMER and FIMO comprised of different versions of E-Box motifs (MYCN, MYOD1, TCF3, TCF12, MYOG, AP4, PFTA, MAX) (Supporting Information Fig. S6); the top two variants being “CG” and “GC” (Fig. 3F). This analysis agrees to our earlier observation that CAGCTG and CACGTG are the top two MESP1 preferred binding motifs. Analysis using HOMER revealed difference in the immediate flanking bases for “CG” and “GC” variant (Fig. 3F). Similar to previous study on MYOD 21, MESP1 binding is dependent on flanking and internal nucleotides of CANNTG E-box.

MESP1 Targets Genes Involved in Mesendoderm Formation

To identify gene targets associated with ChIP-Seq peaks, we compiled a list of 14,006 “potential Mesp1 targets” by assigning each peak (43,346 peaks) to the gene whose TSS was closest. We further refined the list by incorporating the MESP1-YFP+ cells transcriptome. We retained 6,470 targets which exhibited differential expression (RNA-Seq, p < 0.05) at day 5 YFP+ cells compared with ESCs (Fig. 4A). 3,201 and 3,269 were MESP1 activated and repressed targets (Supporting Information Table S5; Fig. 4A), depending on either an increase or decrease in their expression at day 5 YFP+ cells compared with ESCs. Promoters at MESP1 activated targets showed a significant amount of increase in H3K4me3 modification compared with the repressed targets (Supporting Information Fig. S7).

Details are in the caption following the image

Functional assessment of MESP1 targets. (A): Stepwise filtering in identifying high-confidence MESP1 activation targets. First, MESP1 peaks were identified from 19 million unique mapped reads from ChIP-Seq. Each peak was assigned to a gene with its transcription start site nearest to the peak. The genes which showed differential expression (p < 0.05) between embryonic stem cell and day 5 YFP+ cells were retained as MESP1 direct targets, and we obtained 3,200 MESP1 activated, and 3,268 repressed targets, respectively. In a second approach, we compared day 5 YFP+ cells to YFP cells and obtained 476 activated and 797 repressed targets. (B): Gene ontology (GO) analysis of MESP1 activated and repressed targets. The color scale in the heat map is shown as −log10 p value. (C): Clustering of MESP1 activation targets on the basis of their temporal expression pattern. Hierarchical clustering was performed using the fragments per kilobase of exon per million fragments mapped (fpkm) values at days 0, 5, 6, 7, and 8 YFP+ cells. Nine maximally homogeneous clusters were obtained; example genes for each cluster are also indicated. Cluster assignment for the MESP1 activated targets can be assessed in Supporting Information Table S5. (D): Individual GO analysis of MESP1 activated genes belonging to the nine clusters. The color scale is indicated in −log10 p value. Abbreviation: ESC, embryonic stem cell.

To determine the biological functions of MESP1 direct targets, we performed separate GO analysis for MESP1 activating and repressing targets using mouse genome as the background. GO terms pertaining to both mesoderm development (p = 1.89 × 10−9) and endoderm development (p = 3.98 × 10−5), a variety of signaling pathways (Shh, TGF, BMP, Wnt, Notch, and FGF), and next-tier GO terms denoting mesendoderm-derived organogenesis were significantly enriched in the MESP1 activated targets (Table 1; Fig. 4B). Thus, MESP1 activation targets are strongly enriched for mesoderm and endoderm developmental GO terms, indicating that MESP1 directly regulates developmental pathways pertaining to mesendoderm other than strictly a cardiac lineage. However, the MESP1 repressing targets were not enriched in mesendoderm related terms, but in GO terms pertaining to broad biological processes such as cell cycle processes (Supporting Information Table S5).

Table 1. Gene ontology analysis of high-confidence MESP1 activation targets
Germ layer GO terms enriched for MESP1 activation targets
GOID Term Count OR p-Value
GO:0007498 Mesoderm development 30 0.989772 1.90E-09
GO:0007492 Endoderm development 15 0.494886 3.98E-05
GO:0007398 Ectoderm development 31 1.022765 0.023407
GO:0048339 Paraxial mesoderm development 6 0.197954 0.055491
Developmental and signaling GO terms enriched for MESP1 activation targets
Germ layer GOID Term Count OR p-Value
MES GO:0007507 Heart development 107 3.530188 1.75E-29
MES GO:0060429 Epithelium development 101 3.332234 5.28E-18
MES GO:0001501 Skeletal system development 102 3.365226 8.69E-17
MES GO:0001944 Vasculature development 91 3.002309 2.04E-15
MES GO:0001568 Blood vessel development 88 2.903332 1.14E-14
MES GO:0060485 Mesenchyme development 33 1.08875 1.50E-14
MES GO:0051216 Cartilage development 38 1.253712 4.28E-11
MES GO:0001822 Kidney development 46 1.517651 5.89E-11
MES GO:0014706 Striated muscle tissue development 51 1.682613 9.18E-11
MES GO:0060348 Bone development 44 1.451666 2.86E-08
MES GO:0048738 Cardiac muscle tissue development 25 0.82481 1.70E-06
MES GO:0048534 Haematopoietic or lymphoid organ development 61 2.012537 0.006996
EN GO:0060541 Respiratory system development 44 1.451666 1.48E-07
EN GO:0030324 Lung development 39 1.286704 1.11E-06
EN GO:0048565 Gut development 19 0.626856 7.08E-06
EN GO:0031016 Pancreas development 18 0.593863 1.37E-05
EN GO:0055123 Digestive system development 17 0.560871 1.69E-05
EN GO:0001889 Liver development 18 0.593863 1.41E-04
EC GO:0031175 Neuron projection development 85 2.804355 1.24E-16
EC GO:0048666 Neuron development 97 3.200264 1.55E-13
EC GO:0014032 Neural crest cell development 22 0.725833 6.27E-10
SIG GO:0016055 Wnt receptor signaling pathway 61 2.012537 1.60E-16
SIG GO:0030509 BMP signaling pathway 17 0.560871 2.72E-08
SIG GO:0007219 Notch signaling pathway 25 0.82481 5.12E-07
SIG GO:0007179 Transforming growth factor beta receptor signaling pathway 23 0.758825 7.96E-07
SIG GO:0007224 Smoothened signaling pathway 13 0.428901 5.66E-04
  • Count: number of genes associated with the GO team.
  • Abbreviations: EN, endoderm; EC, ectoderm; MES, mesoderm; ORs, the odds ratio between the numbers of predicted genes and observed genes in the GO team; SIG, signaling pathways.

To better understand the dynamic expression patterns of MESP1 activated targets, we next performed unsupervised hierarchical clustering of their expression profiles (ESC and YFP+ cells from days 5, 6, 7, and 8) (Fig. 4C). Our analysis revealed nine groups of targets showing unique temporal expression patterns (Fig. 4C). Cluster assignment for the targets is provided in Supporting Information Table. S5. To further understand the functions associated with these clusters of targets, we performed GO analysis for each cluster. Interestingly, targets whose expression transiently peak at day 5 (cluster 9 in Fig. 4C) showed the highest enrichment in gastrulation, pattern specification, and embryonic morphogenesis (Fig. 4D). Mesendoderm-derived organogenesis was associated with MESP1 targets whose expression became most prominent at day 6 or beyond (Fig. 4D). Such patterns indicate that MESP1 target genes form a coordinated gene network which drives mesendoderm formation and subsequently lineage specifications.

We also filtered the “potential Mesp1 targets” through differentially expressed genes between day 5 MESP1 YFP+ and YFP cells (Fig. 4A, right branch). This comparison yielded 475 activated and 796 repressed targets. This set of MESP1 activated targets also showed strong enrichment for mesoderm (p = 1.13 × 10−7) and endoderm development (p = 0.0046), while the repressed targets showed no enrichment for these terms (Fig. 4B; Supporting Information Table S5). GO terms for ectoderm development showed enrichment in repressed targets (Fig. 4B) but not in activated targets. This indicates that MESP1 activated targets contribute to mesoderm and endoderm programs. Other GO terms associated with mesendoderm-derived organogenesis also showed enrichment among MESP1 activated targets when referenced to day 5 YFP cells (Fig. 4B). The 797 MESP1 repressed targets showed small enrichment for endoderm associated terms pertaining to lung development (p = 0.0016), pancreas development (p = 0.0153). But, these GO terms also showed enrichment in MESP1 activated targets. Terms related to gut development were absent in the repressed targets, which indicates that MESP1 YFP+ cells contribute more to gut lineages than the YFP population. Using two different reference systems (ESCs and day 5 YFP cells), we identified two sets of MESP1 direct activated targets and both sets showed strong enrichment of both mesoderm and endoderm GO terms. Thus, MESP1 directly regulates developmental pathways pertaining to mesendoderm other than strictly a cardiac lineage.

Since the GO term—“mesendoderm development” is only partially completed 22, to test if MESP1 targets key mesendoderm regulators, we investigated individual genes that are directly involved in mesendoderm development. Mesendoderm markers such as Gata4, Eomes, Wnt5a, Wnt5b, Mixl1, T, Gsc, and Wnt3 were among MESP1 activated targets (Fig. 5A). We found that MESP1 may regulate itself, perhaps through a peak located about −8 kb from the TSS (Fig. 6E). Hematopoietic transcription factors such as Tal1, Meis1, and Lmo2 were also direct MESP1 activated gene targets, while other key regulatory blood factors, Gata1, Hbby, and EBf1 and cardiac progenitor markers, Nkx2-5, Mef2c, and Tnnt2 were not (Supporting Information Table S5; Fig. 5C). In YFP+ cells, Mesp1's temporal expression pattern correlated well with the appearance of mesendoderm gene expression (such as Gata4, Mixl1, Wnt5b, Wnt3, and T) while significantly differing from that of key cardiac and blood genes (Fig. 5C); thus, MESP1 indirectly regulates the cardiac or hematopoietic differentiation programs in a discrete MESP1 enriched population 23. We also examined the expression of these key mesendoderm markers in the YFP population. Most of the mesendoderm markers except T showed significant higher expression at day 5 in YFP+ cells than YFP cells (Fig. 5A). MESP1 activated targets, which are involved in cardiac and hematopoietic programs, are higher in YFP+ cells compared with YFP cells at later stages of differentiation (after day5), indicating an indirect regulation of these programs by MESP1 (Fig. 5A).

Details are in the caption following the image

MESP1 directly regulates mesendoderm genes. (A): mRNA expression levels, differential expression between YFP+ and YFP cells, and ChIP-Seq peak scores of key mesendoderm, endoderm, cardiac, and blood genes. For each lineage-specific group of genes, there are three heat maps placed in three subpanels. The first heat map represents expression profiles of the key markers in embryonic stem cell (ESC), YFP+ and YFP cells at different time points. The middle heat map indicates if the gene was differentially expressed between YFP+ versus ESCs/YFP cells at different time points. In this heat map, “red,” “blue,” and “white” colors indicate up-regulation in YFP+, down regulation in YFP, and not differentially expressed, respectively. The third map represents the peak score from the MESP1 ChIP-Seq associated with the gene. The color scale of peak score is shown as log2 of read-count. In case of multiple peaks assigned to the same gene, the one with the highest score was used. The peak scores for non-MESP1 activated targets are shown as blank. (B): Coimmunostaining analysis of Mesp1-lineage and T, Foxa2, and Sox17. The arrows point to cells that show nuclear T, Foxa2, or Sox17 staining and cytosol YFP staining. Abbreviations: DAPI, 4′,6-diamidino-2-phenylindole; ESC, embryonic stem cell; GFP, green fluorescent protein.

Previous studies using MESP1 overexpression followed by ChIP-PCR reported that MESP1 bound to conserved E-box sites at regions close to TSS of Foxa2 (5 kb), Sox17 (4 kb), Gsc (4.5 kb), and T (1.5 kb), leading to rapid down-regulation of these genes 5, 23. In our unbiased approach, endoderm markers Foxa2 and Sox17 were MESP1 activated targets (Fig. 5A). It is possible that upon forced induction, MESP1 binds to regions indicated by the previous study 7 close to TSS and down-regulates the important endoderm markers, likely through an inhibitory complex. In YFP+ cells, endoderm genes Foxa2, Cer1, and Sox17 began to appear at day 5 and peak at day 6, while Mesp1 levels fell and disappeared. These data suggest that a subset of MESP1 YFP+ cells, at a later stage showing definitive endoderm (DE) lineage characteristics does not depend on continuous MESP1 gene expression. Sox17, Foxa2, and Cer1 were expressed at higher levels in YFP cells at days 5, 6, and 8 (only Foxa2) indicating that YFP cells primarily contribute to the endoderm lineage. To prove the concept, we costained YFP+ cells with T, Foxa2, or Sox17 in day 5 differentiated cells (Fig. 5B). Both T and Foxa2 were highly prevalent, agreeing with efficient mesendoderm formation. YFP signals were located in the cytosol, and were frequently detected in T or Foxa2 positive cells. Sox17 signals were more restricted in small number of clustered cells, agreeing with limited endoderm differentiation. Co-staining of Sox17 and YFP was also evident. Although T, Foxa2, and Sox17 were more prevalent in YFP cells, our data support that MESP1 is associated with at least a population of mesendoderm cells which contribute to endoderm lineages, but after the reduction of Mesp1 gene activity.

We further explored GO terms associated with MESP1 activated targets which were differentially expressed between YFP+ and YFP cells at different time points during differentiation. For this, we compiled two lists of MESP1 activated targets. The first group consists of targets which were strictly expressed at higher levels in YFP+ cells during differentiation, while the second group was strictly expressed at higher expression in YFP cells during differentiation. The first group showed more enrichment in mesoderm and majority of endoderm derived organogenesis (Supporting information Fig. S8). However, some of the endoderm term such as respiratory and lung development were also enriched in second group. This indicates that some of the genes, which are activated post-MESP1 expression in the YFP+ cells, are also expressed in the YFP population and may contribute to endoderm lineages. Such genes are most likely regulated by other factors, independent of MESP1, in the YFP population.

MESP1 May Recruit H3K27ac Epigenetic Modification to Distal Regulatory Regions

To assess if MESP1 binding sites might modify local chromatin structure, we compared MESP1 ChIP-Seq data to H3K27ac ChIP-Seq data from enriched mesoderm cells 24, and also to H3K4me3 tracks generated in this study. In the enriched 43,346 peak regions, the amount of H3K27 acetylation showed strong correlation (Pearson correlation coefficient = 0.91, Fig. 6A) with MESP1 binding signal. Similar correlation was also observed (Pearson correlation coefficient = 0.84, Fig. 6B) in intergenic or intragenic distal peaks (>10 kb from TSS).To determine if H3K27 acetylation at MESP1 bound regions is MESP1-binding dependent, we compared amount of H3K27 acetylation at MESP1-bound regions in mesoderm cells to that in ESCs. An increase in the amount of H3K27 acetylation was observed between ESCs and mesoderm cells at MESP1-bound peaks and also only at distal enhancers (Supporting Information Fig. S9; Fig. 6C, 6D). Core mesendoderm genes, Gata4, Wnt3, Gsc, Gata4, Gata6, and Mixl1 displayed enhanced H3K27ac histone modification in MESP1-binding regions and heighten H3K4me3 epigenetic modifications at their promoters in comparison with virtually nil levels in ESCs (Fig. 6). Mesp1 itself displayed de novo appearance of H3K27ac modification at endogenous MESP1 binding regions, but H3K4me3 levels are already present in replicating ESCs (Fig. 6E), likely due to activation by Oct4 and Lef1/β-catenin 25. We propose that MESP1 target sites that incorporate the genetic signature of increased H3K27ac and H3K4me3 are likely participants in the acquisition of more differentiated states and constitute an essential mechanism for the transition of pluripotency to mesendoderm.

Details are in the caption following the image

MESP1-binding correlates globally with H3K27ac epigenetic modification. (A): H3K27ac signal correlates globally with MESP1 ChIP signals. At all the MESP1 peak locations, the H3K27ac ChIP signal correlated with MESP1 ChIP signal with a Pearson correlation of 0.91. (B): H3K27ac signal correlates with MESP1 ChIP Signal at MESP1 enhancer peaks. At all the MESP1 peak enhancer locations, the H3K27ac ChIP signal in mesoderm cells correlated with MESP1 ChIP signal with a Pearson correlation of 0.84. (C): Increase in H3K27ac signal at MESP1 peak regions from embryonic stem cell (ESC) to mesoderm cells. (D): Increase in H3K27ac signal at MESP1 peak enhancer regions from ESCs to mesoderm cells. (E–J): Alignment profiles of ChIP-Seq reads at loci of selected MESP1 mesendoderm targets. The alignment profiles represent the read depth at resolution of 1 nucleotide. Profiles are shown for Mesp1, Gsc, Gata4, Gata6, Wnt3, and Eomes. The identified MESP1 binding regions which are enriched compared with background ChIP-Seq data with p < 10-8 are shaded in gray. The scales for H3K27ac plots between ESCs and day 5 were adjusted for visual comparison between the two conditions. Similar adjustment was done for H3K4me3 plots between ESCs and day 5. Abbreviation: ESC, embryonic stem cell.

Discussion

Mesendoderm, an ancient germ layer from worms to frogs, gives rise to both endoderm organs such as liver, foregut, and pancreas and mesoderm organs such as heart, blood, and bone 26. Mesendoderm is implicated as the major source of cardiac mesoderm and anterior endoderm in mammals, a part of which can eventually differentiate into hepatocytes and pancreatic cells 27. From C. elegansup to Xenopus, NODAL, WNT5A/B, PITX2, GSC, MIXL1, EOMES, and GATA4/6 have been shown to participate in mesendoderm induction 28, 29. The origin of mammalian mesendoderm has not been as well studied, but mesoderm and endoderm cell–cell interactions are especially important. Our model depicts how MESP1 directs a mesendoderm bi-potential developmental pathway and serves as a novel paradigm shift for MESP's role in the cardiopoesis program. We profiled the transcriptome of a pure population of MESP1-marked cells along with determining the chromatome of endogenous MESP1 bound DNA targets. Surprisingly, MESP1 primarily directs the appearance of mesendoderm instead of the cardiac program per se. Critical mesendoderm modulators including Mixl1, Pitx2, Gata4, Gata6, Wnt5a, Wnt5b, Sox17, and Foxa2 were enriched in MESP1YFP+cells before the appearance of cardiac progenitors and myocytes.

Previously, we demonstrated that an endoderm associated Sry-box transcription factor, SOX17, was essential for cardiac specification in differentiating mESCs acting at least in part via cell-non-autonomous mechanisms 30. Recently, we showed in unbiased genome-wide testing, SOX17 expression in ESCs was a prerequisite for the induction of highly diverse cardiogenic transcription factors and cardiac structural genes 31. HHEX and CER1 are indispensable components of the SOX17 pathway for cardiopoiesis in mESCs, acting at a stage downstream from MESP1/2. Our demonstration that Sox17, Foxa2, and Cer1 are MESP1 targets and their transcripts become enriched in MESP1 YFP+ cells supports the idea that mesendoderm cells initiates a potent auto-regulatory program to direct the earliest cardiac and endoderm progenitors.

We used a stepwise filtration to identify “MESP1 activation targets”. MESP1 ChIP-Seq alone, however, resulted in a total of 43,346 peaks among which most resided both on regions distal as well as close to TSS. This was surprising as most embryonic transcription factors target sites around TSS in current literature. The peaks were significant with a stringent p value of 10−8 compared with the background, our ChIP-Seq data passed high quality criteria recommended by ENCODE, and correlated with H3K27 acetylation indicating that the peaks are specific. Interestingly, ChIP with another lineage specifying bHLH factor MyoD in myoblasts and myotubes yielded large amount of reads in distal enhancer regions, which also showed increased H4 acetylation 21. Why these regulatory regions harbor activating histone modification without net downstream activation is intriguing. Perhaps the bHLH-recruited histone acetylation leads the genes to a “poised” status, ready for expression in the next stage of cellular differentiation, which is an interesting direction to pursue in future studies.

Forced expression studies are held with some degree of skepticism when their activity is extrapolated to behavior of the endogenous factor. Recently, Tapscott and colleagues provided a litany of cogent reasons why the legitimacy of forced expression studies may be questionable, such as non-physiological conditions and non- specific DNA binding; even though as a point of fact they did not detect significant differences between their unbiased MyoD ChIP assays and overexpression data 32. We compiled a list of DNA regions which were previously tested for MESP1 enrichment using ChIP-PCR 5. About 25% of the MESP1 enriched regions identified were detected by our MESP1 ChIP-Seq. We also compared the genome-wide binding of endogenous MESP1 against the microarray analysis following MESP1- overexpression (Table 1 of the published work by the Blanpain group) 33. 155 (145 activated and 10 repressed targets in our data) out of 216 potential MESP1 DNA binding targets were confirmed (Supporting Information Table S6). MESP1 was claimed to promote the expression of cardiac structural genes, such as Myh7 (β-MHC), Myh6 (α-MHC), Myl1 (MLC1f), Myl2 (MLC2v), and Tnnt2 (cTnT), but none of these contractile proteins, which appear almost 2–3 days after the transient induction of MESP1, were identified to be direct targets. We also found concordance between the published findings by Lindsley et al. and ours in which their transient MESP1 expression in ESCs markedly increased the frequency of PDGFRα + and FLK1 + cells 34. We noted that Pdgfrα and Flk1 are MESP1 targets. Also, MESP1 robustly induces transcription factors that regulate EMT, such as Snai1 and Twist 35, 36 which were also confirmed by our study, as MESP1 gene targets (Supporting Information Table S5). Furthermore, MESP1 failed to induce paraxial mesoderm genes, such as Meox1, Tcf15, Tbx6, or Pax1 37 or skeletal, myogenic transcription factors Myod, Myogenin, or Myf5. Besides DE markers such as Foxa2, Cer1, and Sox17, MESP1 also binds to DNA regions associated with visceral endoderm (VE) markers such as Sox7 and Gata5 (Fig. 5A). VE cells have been shown to get integrated into DE rather than being displaced by it at a later stage. The expressions of DE and VE markers are more prominent after the disappearance of MESP1 in YFP+ cells (Fig. 5A) and the DE markers are more enriched in YFP cells. These data suggest that MESP1 YFP+ cells contribute to both VE and DE program, but after the disappearance of MESP1.

We and others previously reported that Mesp1 transcription is under the regulation of canonical Wnt and T-box factors (T and Eomes) 25, 38, 39. New data from this study suggest that MESP1 regulates the expression of Eomes, Wnt3a, and Mesp1 itself. Together MESP1 and these early factors form a self-regulatory network which drives formation of mesendoderm or a set of it. By epigenetic marks, MESP1 further specify downstream cardiac genes, though at a later stage when MESP1 itself is no longer expressed. Our study using MESP1 YFP+ cells reveals that MESP1 expression correlates with key mesendoderm markers. First heart field markers such as Fgf8, Meis2, and Tbx5 and the second heart field markers such as Smarcd3, Hoxb2, Hoxa1, and Cited1 40 have highest expression after day 5. These observations indicate that MESP1 YFP+ cells are not primarily restricted to FHF and SHF, but can give rise to other mesoderm lineage including hematopoietic and endoderm lineage.

To achieve long-term cardiac cell therapy, it's important to understand the regulatory network of cardiac progenitors. In particular, our group found that MESP1 and ETS2 was able to transdifferentiate human dermal fibroblasts into cardiac progenitors 41. Other cocktails that successfully converted human fibroblasts to cardiomyocyte-like cells also included MESP1 as one of the transcription factors 42. This study demonstrates that MESP1 primarily drives the activation of genes involved in mesendoderm lineages, which potentially promotes interaction between different germ layers (mesoderm and endoderm) leading to cardiac, blood and skeletal muscle differentiation. We believe that further elucidation of the genetic makeup of the MESP1 dependent regulatory elements would provide novel strategies in driving mesendoderm in tissue regeneration.

Conclusion

Our unbiased whole-genome analyses support that MESP1 chiefly targets and activates genes pertinent to the mesendoderm program, thus initiates a potent auto-regulatory program to direct the earliest cardiac progenitors. It may also recruit histone acetylation to target genes to establish a “poised” status, ready for expression in mesendoderm-derived organogenesis.

Author Contributions

B.S.: assembly of data, data analysis and interpretation, and manuscript writing; A.B., J.K., K.-C.W., L.Y., and M.R.: collection of data and data interpretation; A.A.: collection of data; X.X. and A.J.C: provision of study material; R.J.S. and Y.L.: conception and design, data analysis and interpretation, financial support, manuscript writing, and final approval of manuscript. B.S, A.B., J.K., and K.W. contributed equally to this article.

Disclosure of Potential Conflicts of Interest

The authors indicate no potential conflicts of interest.