Authors: Michael C.U. Hammond-Kosack, Robert King, Kostya Kanyuka, Kim E. Hammond-Kosack
A data set of promoter and 5′UTR sequences of homoeo-alleles of 459 wheat genes that contribute to agriculturally important traits in 95 ancestral and commercial wheat cultivars is presented here. The high-stringency myBaits technology used made individual capture of homoeo-allele promoters possible, which is reported here for the first time. Promoters of most genes are remarkably conserved across the 83 hexaploid cultivars used with <7 haplotypes per promoter and 21% being identical to the reference Chinese Spring. InDels and many high-confidence SNPs are located within predicted plant transcription factor binding sites, potentially changing gene expression. Most haplotypes found in the Watkins landraces and a few haplotypes found in Triticum monococcum, germplasms hitherto not thought to have been used in modern wheat breeding, are already found in many commercial hexaploid wheats. The full data set which is useful for genomic and gene function studies and wheat breeding is available at https://rrescloud.rothamsted.ac.uk/index.php/s/DMCFDu5iAGTl50u/authenticate.
IntroductionWheat provides about one fifth of the calories consumed by humans globally and contributes the greatest source of proteins to the human diet (FAOSTAT, 2017a,b). Therefore, a sustainable and resilient wheat crop that can meet the nutritional demands of the ever-growing human population is essential for global food security. Plant breeders strive continually to improve varieties by manipulating genetically complex yield and end-user quality traits while maintaining yield stability, improving nutrient use efficiencies and providing regional adaptation to specific abiotic and biotic stresses, for example, an ever-increasing number of pathogen and pest threats (Atlin et al., 2017; Bonjean and Angus, 2001; Fisher et al., 2012).
A fully annotated, high-quality sequence assembly of the large and complex hexaploid wheat genome (2n = 6x = 42; AABBDD), IWGSCrefseq_v1.0 was used (The IWGSC et al., 2018). The 14.5-Gbp genome of the wheat landrace Chinese Spring (CS) contains nearly 270 000 genes, of which 107 891 were predicted with high-confidence. Development of a gene expression atlas representing all stages of wheat development together with the accurate genome assembly has enabled the discovery of tissue- and developmental stage-related gene co-expression networks (The IWGSC et al., 2018) and an exploration of the relative expression levels of the homoeo-alleles of each predicted gene on the A, B and D sub-genomes (Allen et al., 2017; Arora et al., 2019; Ramírez-González et al., 2018; Winfield et al., 2018).
Phenotypic variation of a trait is thought to occur due to variations of the coding DNA sequences (CDS) within the genes underlying the trait, as well as the environmental factors and gene-by-environment interactions. However, accumulating evidence suggests that mutations within regulatory regions may be equally important in generation of significant phenotypic differences (Li et al., 2012; Wray, 2007). Therefore, polymorphisms in sequences regulating gene expression may be important in shaping the natural trait variation in wheat and other plant species.
Here we investigated the variation in the sequences (spanning 5′UTRs and potential promoters and for simplicity hereafter referred to as ‘promoters’) located within 1700 nucleotides upstream of the CDS of 459 wheat genes, associated with agriculturally important traits, in ancestral, synthetic, historic and modern wheat genotypes (Allen et al., 2017; Winfield et al., 2018). The main practical objective was to determine whether the current target capture sequencing technology, which has so far been mostly used for analysing variation in exons and gene-specific marker discovery (Arora et al., 2019), could also be used to effectively capture and sequence promoters of homoeologous wheat genes. The main scientific aims were to (i) compare the promoter variation (haplotypes) present in different wheat genotypes, and assess levels of polymorphism between wheat species with different ploidy levels, (ii) assess promoter sequence variation in ancestral wheat and commercial wheat cultivars, (iii) determine whether any of the identified polymorphisms may be located at recognized regulatory motifs (transcription factor binding sites, TFBS), (iv) determine whether large deletions are associated with insertion/deletion of repetitive elements and (v) explore whether ancient species may have already contributed to modern wheat breeding.
Gene and germplasm selectionFor this study, ten commercial traits for wheat improvement were selected and known or candidate genes underlying these traits were collated by dedicated trait coordinators (see Acknowledgements). 459 wheat genes of interest with a total of 1273 unique homoeo-allele sequences were chosen for sequence capture and detailed analyses (Table 1 and Data S1). The distribution of the selected genes across the Chinese Spring (CS) chromosomes (IWGSC_refseq_v1.0) are shown in Figure S1. For the germplasm to be analysed, 69 historic and modern commercial hexaploid wheat (Triticum aestivum) cultivars including CS, 15 wheat landraces (T. aestivum) from the A. E. Watkins collection (Winfield et al., 2018; Wingen et al., 2014), eight T. monococcum (2n = 2x = 14; AmAm) accessions (Jing et al., 2007; Li et al., 2018; McMillan et al., 2014; Simons et al., 2021) and single accessions for T. durum (2n = 4x = 28; AABB), Aegilops tauschii (2n = 2x = 14; DD), Ae. speltoides (ASP) (2n = 2x = 14; SS) and the wild species Ae. peregrina (APG) (2n = 4x = 28; SvSvUU) (Table S1, Data S2) were chosen collaboratively by the UK wheat community (see Acknowledgments).
See the table and read the full paper by clicking download