====== Haeyoung's metagenomics resource ====== ===== Useful papers, websites, and videos ===== * [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059276/|An introduction to the analysis of shotgun metagenomic data (Front. Plant Sci. (2014)]] * [[https://www.ncbi.nlm.nih.gov/pubmed/28898207|Shotgun metagenomics, from sampling to analysis. Nature Biotechnol. (2017)]] * [[http://metagenomics-workshop.readthedocs.io/en/latest/index.html|Metagenomic Workshop in Uppsala]] * [[http://2017-ucsc-metagenomics.readthedocs.io/en/latest/index.html|2017 Metagenomics workshop at UC Santa Cruz]] * [[http://www.metagenomics.wiki/|Metagenomics wiki]] * [[http://blog.mothur.org/2016/01/12/mothur-and-qiime/|mothur and QIIME]] Despite their differences in philosophy, //most// of the differences in mothur and QIIME are cosmetic. * [[https://youtu.be/ED8VkMLYTTI|Aligning 16S sequences to reference set and filtering the alignment]] This tutorial explains how to align 16S sequences to the Silva reference set and how to filter the alignment and remove identical sequences. ===== 16S rRNA-based ===== ==== QiiME ==== * http://qiime.org/ - no longer supported since January 1, 2018. * [[https://docs.qiime2.org/2019.1/|Qiime 2 user documentation]] * [[http://qiime.org/tutorials/index.html|Qiime official tutorial]] * [[https://www.youtube.com/watch?v=nWeRN2lKIto|Microbiome/Metagenome Analysis Workshop: QIIME]] * [[https://github.com/haruosuz/qiime|Qiime tutorial project]] **강력 추천** * [[https://docs.qiime2.org/2017.10/tutorials/moving-pictures/|Qiime 2 "Moving Pictures" tutorial]] - Caporaso의 [[https://www.ncbi.nlm.nih.gov/pubmed/21624126|Genome Biol. (2011) 논문]] ==== PhyloSeq ==== * [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217|PLoS One (2013) 논문]] "Handling and analysis of high-throughput microbiome census data" * [[http://bioconductor.org/packages/release/bioc/html/phyloseq.html|BioConductor]] ==== Mothur ==== * https://www.mothur.org/ - download, wick, forum & facebook ==== Others ==== * [[16S rRNA sequencing data types]] * [[custom scripts for 16S rRNA sequence analysis]] ===== Taxonomic profiling from metagenomic shotgun sequences (without assembly) ==== * [[../taxonimic_profiling_from_metagenome_sequences]] (without assembly) * [[../custom_kraken_db_test]] * [[../bracken_for_species_abundance_esimation]] ===== Shotgun sequencing and assembly-based ===== * [[https://bioinformatictools.wordpress.com/tag/metamos/|Metagenomic assemblers list]] (2015-09-15) * [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502489/|Assembling metagenomes, one community at a time. BMC Genomics (2017)]] - 적당한 metagenome용 assembler 고르기 * (a milestone paper) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotech (2013) [[https://www.ncbi.nlm.nih.gov/pubmed/23707974|PubMed]] ==== Reviews ==== * (Review) Recovering complete and draft population geonomes from metagenome datasets (2016)[[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4782286/|PubMed]] * (Review) Bioinformatic strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics (2017) [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148923/|PubMed]] ===== Phylogenetic classification of contigs ===== * PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ (2014) [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3897386/|PMC]] [[http://metagenomics-workshop.readthedocs.io/en/latest/taxonomic-classification/phylosift.html|"Using PhyloSift" workshop]] * MataPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods (2015) [[https://www.nature.com/articles/nmeth.3589|원문]] * PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ (2016) [[https://www.ncbi.nlm.nih.gov/pubmed/26870609|PubMed]] * ICoVeR – an interactive visualization tool for verification and refinement of metagenomic bins. BMC Bioinformatics (2017) [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414344/|PMC]] CONCOCT, MaxBin, metaBAT... ===== Pipelines or web servers for metagenomc data analysis ===== * MetAMOS [[https://github.com/marbl/metAMOS|GitHub]] 2016년도에는 관심이 꽤 많았었는데 요즘은 그저 그런 상태임. * MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis [[https://www.biorxiv.org/content/early/2018/03/06/277442|bioRxiv]] [[https://github.com/bxlab/metaWRAP|GitHub]] * MicrobiomeAnalysis - a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data https://www.microbiomeanalyst.ca/ ===== A real example of Cyanobacterial genome assembly ===== [[http://genomea.asm.org/content/5/9/e01676-16.full|Draft genome sequences of nine cyanobacterial strains from diverse habitats]] 본격적인 metagenome 연구와는 분명히 거리가 있으나 metagenome 연구 방법을 활용해야 하는 순간이 있다. 내가 연구하고자 하는 세균의 시퀀싱 결과를 얻었지만 피치못하게 오염이 존재하거나 또는 자연계에서 다른 세균과 공생 상태로 사는 경우이다. 수많은 contig 중에서 target genome의 contig만을 분리하는 것뿐만 아니라 공존하는 타 세균의 다양성을 파악하는 것도 전부 중요하다. 담수에 범람하여 환경에 큰 해를 끼치는 녹조의 주범 남세균 유전체 시퀀싱도 그 중의 하나이다. 최근에 Microcystis aeruginosa 국내 분리종 2건에 대한 유전체 및 전사체 분석을 하면서 이런 문제를 충분히 경험하였다. 마침 9종의 남세균 시퀀싱을 하면서 최신 생명정보학 기법을 동원한 Genome Announcements 논문이 있어서 사용한 프로그램을 살펴보기로 하였다. NCBI WGS 에 등록된 서열의 contig 수는 92(Oscillatoria rosea NIES-208), 320(Nostoc calcicola FACHB-389), 34(Fischerella major NIES-592), 68(Hydrococcus rivularis NIES-593), 43(Chroogloeocystis siderophila NIES-1031), 243(Calothrix sp. NIES-2101), 90(Phormidium ambiguum NIES-2119), 179(Scytonema sp. NIES-2130), 44(Phormidium tenue NIES-30)이다. Microcystis aeruginosa는 포함되지 않았다. * Genome sequencing: Illumina Hiseq 2000 system (2x100 from a ~500 bp fragment genomic library) or MiSeq(2x300 from 300-500 bp library) * **trimmomatic** v0.33 * **SPAdes** v3.9.0 with "--meta" mode ([[http://cab.spbu.ru/software/spades/|다운로드]]) * Binning contigs > 2 kb using **MaxBin** v2.2.1([[https://www.ncbi.nlm.nih.gov/pubmed/26515820|논문]] [[http://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html|다운로드]]) - MaxBin is a software that is capable of clustering metagenomic contigs into different bins, each consists of contigs from one species. MaxBin uses the nucleotide composition information and contig abundance information to do achieve binning through an Expectation-Maximization algorithm. * Completeness and contamination assessed using **CheckM** v1.0.5 ([[https://www.ncbi.nlm.nih.gov/pubmed/?term=25977477|논문]] [[http://ecogenomics.github.io/CheckM/|다운로드]]) - an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. * Contigs binned to Cyanobacteria were scaffolded using **BESST** v2.2.4 and **FinishM** v0.0.9 * Polished using **Pilon** v1.20 * Scaffolds were taxonomically classied using **Kaiju** and **PhyloPythiaS+** ==== 실제 실행 사례 ==== **MaxBin**은 완전히 종합 패키지이다. 설치 과정 중에 IDBA-UD, HMMER-3, Bowtie2, FragGeneScan을 덩달아 다운로드하여 빌드한다. {{ :maxbin.png?direct&600 |}} ==== 다른 유용 프로그램 ==== GroopM, a companion of CheckM, can be used to recover genomes from metagenomic data.[[https://peerj.com/articles/603/|논문]] ===== Shotgun metagenomic analysis 2022 ===== 최근 들어서 입수한 메타게놈 자료의 분석을 위하여 새롭게 글 작성을 시작한다. 오래 전에 작성했던 위 내용의 해당 섹션을 보충하느니 차라리 새로운 섹션을 만들고, 필요하다면 별도의 위키 문서로 독립하는 것이 나을 것이다. Biostars에 실린 최근 질문과 답변(2022년 1월 20일 기준 4개월 전)이 훌륭한 시작점이 될 것이다. [[https://www.biostars.org/p/9488577/|Any suggestions on metagenomic pipelines for processing shotgun metagenomics whole genome sequencing samples?]] - metaWRAP, bioBakery & anvi'o Shotgun metagenomic analysis라고 하여 반드시 MAG(metagenome-assembled genome)을 중간물로 사용하는 것은 아니다. 예를 들어 마커 유전자 DB를 이용한 정밀한 phylogenetic analysis라면, read를 꼭 조립해야 하는 것은 아니기 때문이다. [[metagenomic_data_assembly_pipeline]]