Hi-C Metagenomics: Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products
Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a synthetic metagenome sample to accurately cluster metagenome assembly contigs into a small number of groups that differentiated the genomes of each species. The Hi-C data also associated plasmids with the chromosomes of their host and with each other orders of magnitude more frequently than to other species. We further demonstrated that Hi-C data is highly informative for resolving strain-specific genes and nucleotide substitutions between two closely related E. coli strains, K12 DH10B and BL21 (DE3), indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This application of Hi-C has the potential to provide new perspective in the study of thefine-scale population structure of microbes, how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities, and the genetic architecture ofheterogeneous tumor clone populations.