source:https://www.pinterest.com/pin/26810560252776194/
本次分析是使用qiime2-2019.10版本執行 範例raw data下載
資料來源:Liu P.-Y., et al. Variations in Gut Microbiota of Siberian Flying Squirrels Correspond to Seasonal Phenological Changes in Their Hokkaido Subarctic Forest Ecosystem. Microbial ecology 78, 223–231 (2019)
conda activate qiime2-2019.10
mkdir reads_qza
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest \
--output-path reads_qza/paired_end.qza \
--input-format PairedEndFastqManifestPhred33
time qiime cutadapt trim-paired \
--i-demultiplexed-sequences reads_qza/paired_end.qza \
--p-cores 8 \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \
--p-discard-untrimmed \
--o-trimmed-sequences reads_qza/primer-trimmed.qza \
--verbose \
&> primer_trimming.log
qiime demux summarize \
--i-data reads_qza/primer-trimmed.qza \
--o-visualization reads_qza/primer-trimmed.qzv
p-cores為運算核心數,視電腦的配備調整
time qiime dada2 denoise-paired --i-demultiplexed-seqs reads_qza/primer-trimmed.qza \
--p-trunc-len-f 260 \
--p-trunc-len-r 200 \
--p-n-threads 8 \
--output-dir dada2_output
使用DADA2 denoise法
根據R1跟R2 fastq的quality scores,將R1裁成260bp,R2裁成200bp,注意裁切後的長度要使R1、R2合併時,要有20bp以上的overlapping
p-max-ee為容許錯誤率,預設是2
p-n-threads為運算核心數,視電腦的配備調整
qiime tools export --input-path dada2_output/denoising_stats.qza --output-path dada2_output
輸出denoise完的報表,可以檢視各樣本filtering, denoise, merge, remove chimera後的read counts
qiime feature-table filter-features --i-table dada2_output/table.qza \
--p-min-frequency 1 \
--p-min-samples 1 \
--o-filtered-table dada2_output/table_filt.qza
qiime feature-table filter-seqs --i-data dada2_output/representative_sequences.qza \
--i-table dada2_output/table_filt.qza \
--o-filtered-data dada2_output/rep_seqs_filt.qza
qiime feature-table summarize --i-table dada2_output/table_filt.qza --o-visualization dada2_output/table_filt_summary.qzv
過濾rare ASVs,丟掉read count少於p-min-frequency和出現樣本少於p-min-samples的項目
這兩個參數都寫1的話,就是沒有過濾(我後來都跳過這步驟,以保留最多資訊,由後續的統計工具處理)
qiime feature-classifier classify-sklearn \
--i-reads dada2_output/rep_seqs_filt.qza \
--i-classifier ~/metagenome_tool/QIIME2_taxa_classifiers/silva-132-99-nb-classifier.qza \
--p-n-jobs 12 \
--output-dir taxa
qiime tools export --input-path taxa/classification.qza --output-path taxa
QIIME2官網推薦使用Naïve Bayesian assign taxonomy方法,不過該方法搭配SILVA 16S全長資料庫會非常耗費記憶體,需要20GB的配置
VSEARCH方法也是QIIME2裡搭配的taxonomy assignment tools之一,相對所需的硬體需求比較少,跑起來也更省時
Ps. Naïve Bayesian搭配SILVA 132或是Greengenes 13_5的16S rRNA gene 全長或是V4 database可以從QIIME2官網上下載到對應QIIME2版本的classifiers。 若是要用別的資料庫或是改用VSEARCH的話,則需要自行建立序列與物種的artifect檔(qza) 後面會做更詳細的指令。
time qiime feature-classifier classify-consensus-vsearch \
--i-query dada2_output/representative_sequences.qza \
--i-reference-reads /home/pyliu/Genomics/QIIME2_taxa_classifiers/Silva99_16SRepSeq.qza \
--i-reference-taxonomy /home/pyliu/Genomics/QIIME2_taxa_classifiers/Silva99_16STaxonomy.qza \
--p-threads 8 \
--verbose \
--output-dir taxa
time qiime taxa collapse \
--i-table dada2_output/table_filt.qza \
--i-taxonomy taxa/classification.qza \
--p-level 6 \
--o-collapsed-table dada2_output/table-l6.qza
qiime composition add-pseudocount --i-table dada2_output/table-l6.qza \
--o-composition-table dada2_output/table_filt_pseudocount.qza
qiime composition ancom --i-table dada2_output/table_filt_pseudocount.qza \
--m-metadata-file q2_metadata.1.txt \
--m-metadata-column groups \
--output-dir ancom_output
qiime tools export --input-path ancom_output/visualization.qzv --output-path ancom_output
ANCOM也是官網推崇的組間比較、挑出顯著差異的分析方法
但組間差異不明顯的樣本,幾乎無法分析(所以又跳過了~)
qiime tools export --input-path dada2_output/table_filt.qza --output-path dada2_output_exported
qiime tools export --input-path dada2_output/rep_seqs_filt.qza --output-path dada2_output_exported
sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv
biom add-metadata -i dada2_output_exported/feature-table.biom -o dada2_output_exported/feature-table_w_tax.biom --observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy
biom convert -i dada2_output_exported/feature-table_w_tax.biom -o dada2_output_exported/feature-table_w_tax.txt --to-tsv --header-key taxonomy
將qza格式的qiime2 artifact轉回常用的biom或是txt格式,並且跟ASV table / OTU table合併taxonomy資訊
若不合併,則去除--observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy
跟 --header-key taxonomy
# classifier HOMD
time qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path HOMD_16S_rRNA_RefSeq_V15.2.p9.fasta \
--output-path HOMD_RefSeq_V15.2.p9.qza
#如需要裁切database的序列長度
#qiime feature-classifier extract-reads \
# --i-sequences Silva99_16SRepSeq.qza \
# --p-f-primer CCTACGGGNGGCWGCAG \
# --p-r-primer GACTACHVGGGTATCTAATCC \
# --p-trunc-len 120 \ --p-min-length 100 \ --p-max-length 400 \
# --o-reads Silva99_16SRepSeq_341-805.qza
time qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-path HOMD_16S_rRNA_RefSeq_V15.2.qiime.1.taxonomy \
--input-format HeaderlessTSVTaxonomyFormat \
--output-path HOMD_RefSeq_V15.2.taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads HOMD_RefSeq_V15.2.p9.qza \
--i-reference-taxonomy HOMD_RefSeq_V15.2.taxonomy.qza \
--o-classifier HOMD_RefSeq_V15.2.classifier.qza
SILVA 128 版database原始資料下載
SILVA 128 版database壓縮檔下載
qiime feature-classifier classify-sklearn --i-reads dada2_output/rep_seqs_filt.qza \
--i-classifier HOMD_DB/HOMD_RefSeq_V15.2.classifier.qza \
--p-n-jobs 12 \
--output-dir NB_taxa_HOMD
#Tree
mkdir tree_out
qiime alignment mafft --i-sequences dada2_output/representative_sequences.qza \
--p-n-threads 8 \
--o-alignment tree_out/rep_seqs_aligned.qza
qiime alignment mask --i-alignment tree_out/rep_seqs_aligned.qza \
--o-masked-alignment tree_out/rep_seqs_aligned_masked.qza
qiime phylogeny fasttree --i-alignment tree_out/rep_seqs_aligned_masked.qza \
--p-n-threads 8 \
--o-tree tree_out/rep_seqs_aligned_masked_tree
qiime phylogeny midpoint-root --i-tree tree_out/rep_seqs_aligned_masked_tree.qza \
--o-rooted-tree tree_out/rep_seqs_aligned_masked_tree_rooted.qza
#Unifrac
qiime diversity beta-phylogenetic --i-table dada2_output/table.qza \
--i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
--p-metric 'unweighted_unifrac' \
--p-n-jobs 8 \
--output-dir beta_phylo/unweighted_unifrac \
--verbose
qiime diversity beta-phylogenetic --i-table dada2_output/table.qza \
--i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
--p-metric 'weighted_unifrac' \
--p-n-jobs 8 \
--output-dir beta_phylo/weighted_unifrac \
--verbose
qiime diversity alpha-rarefaction \
--i-table dada2_output/deblur_table_final.qza \
--p-max-depth X \
--p-steps 20 \
--o-visualization rarefaction_curves.qzv
X: the maximum read count of the dataset, see denoising_stats.qza / stats.tsv
qiime vsearch cluster-features-de-novo \
--i-sequences dada2_output/representative_sequences.qza \
--i-table dada2_output/table.qza \
--p-perc-identity 0.97 \
--p-threads 6 \
--output-dir cluster_97
qiime tools export --input-path cluster_97/clustered_table.qza --output-path cluster_97_exported
qiime tools export --input-path cluster_97/clustered_sequences.qza --output-path cluster_97_exported
time qiime feature-classifier classify-consensus-vsearch \
--i-query cluster_97/clustered_sequences.qza \
--i-reference-reads ~/Genomics/QIIME2_taxa_classifiers/Silva99_16SRepSeq.qza \
--i-reference-taxonomy ~/Genomics/QIIME2_taxa_classifiers/Silva99_16STaxonomy.qza \
--p-threads 6 --p-maxaccepts 1 --p-perc-identity 0.8 \
--verbose \
--output-dir taxa_cluster_97
qiime tools export --input-path taxa_cluster_97/classification.qza --output-path taxa_cluster_97
sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa_cluster_97/taxonomy.tsv
biom add-metadata -i cluster_97_exported/feature-table.biom -o cluster_97_exported/feature-table_w_tax.biom --observation-metadata-fp taxa_cluster_97/taxonomy.tsv --sc-separated taxonomy
biom convert -i cluster_97_exported/feature-table_w_tax.biom -o cluster_97_exported/feature-table_w_tax.txt --to-tsv --header-key taxonomy
#MERGE DATA
qiime feature-table merge \
--i-tables dada2_output_VM/table.qza dada2_output_VM/turkey.table.qza \
--o-merged-table dada2_output_VM/merge_table.qza
qiime feature-table merge-taxa \
--i-data taxa_VM/classification.qza taxa_VM/turkey.classification.qza \
--o-merged-data taxa_VM/merge.classification.qza
qiime feature-table merge-seqs \
--i-data dada2_output_VM/representative_sequences.qza dada2_output_VM/turkey.representative_sequences.qza \
--o-merged-data dada2_output_VM/merge.representative_sequences.qza
qiime tools export --input-path taxa_VM/merge.classification.qza --output-path taxa_VM/merge.taxonomy
qiime tools export --input-path dada2_output_VM/merge_table.qza --output-path dada2_output_VM_exported/merge.feature-table
qiime tools export --input-path dada2_output_VM/merge.representative_sequences.qza --output-path dada2_output_VM_exported/merge.dna-sequences
sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa_VM/merge.taxonomy/taxonomy.tsv
biom add-metadata -i dada2_output_VM_exported/merge.feature-table/feature-table.biom -o dada2_output_VM_exported/merge.feature-table/merge.feature-table_w_tax.biom --observation-metadata-fp taxa_VM/merge.taxonomy/taxonomy.tsv --sc-separated taxonomy
biom convert -i dada2_output_VM_exported/merge.feature-table/merge.feature-table_w_tax.biom -o dada2_output_VM_exported/merge.feature-table_w_tax.txt --to-tsv --header-key taxonomy
#### PICRUSt2
conda activate picrust2
mkdir picrust2
cd picrust2
place_seqs.py -s ../dada2_output_exported/dna-sequences.fasta -o asv.tre -p 6 --intermediate asv_intermediate/place_seqs --verbose
hsp.py -i 16S -t asv.tre -o asv_marker_predicted_and_nsti.tsv.gz -p 6 -n
hsp.py -i KO -t asv.tre -o KO_predicted_asv.tsv.gz -p 6
time metagenome_pipeline.py -i ../dada2_output_exported/feature-table_w_tax.biom -m asv_marker_predicted_and_nsti.tsv.gz -f KO_predicted_asv.tsv.gz -o KO_metagenome_out_asv --strat_out
convert_table.py KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -c contrib_to_legacy -o KO_metagenome_out_asv/pred_metagenome_contrib.legacy.tsv.gz
time pathway_pipeline.py -i KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out_asv --no_regroup --map /home/pyliu/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv -p 6
time pathway_pipeline.py -i KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -o KEGG_modules_out_asv --no_regroup --map /home/pyliu/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_modules_to_KO.tsv -p 6
add_descriptions.py -i KO_metagenome_out_asv/pred_metagenome_unstrat.tsv.gz -m KO -o KO_metagenome_out_asv/pred_metagenome_unstrat_descrip.tsv.gz
QIIME 2 user documentation QIIME2 官方網站
PICRUSt2 Metagenome功能預測工具
Microbiome Helper 16S/shotgun metagenome 分析建議pipeline
MicrobiomeAnalyst 16S microbiome profiling統計及視覺化工具
GUSTA ME 微生物生態學相關統計教學
MARco Microbiome R語言分析作圖懶人包