QIIME 2 Pipeline for
Siberian Flying Squirrel Fecal Microbiome

Siberian flying squirrel
source:https://www.pinterest.com/pin/26810560252776194/

本次分析是使用qiime2-2019.10版本執行 範例raw data下載
資料來源:Liu P.-Y., et al. Variations in Gut Microbiota of Siberian Flying Squirrels Correspond to Seasonal Phenological Changes in Their Hokkaido Subarctic Forest Ecosystem. Microbial ecology 78, 223–231 (2019)

  1. Activate conda environment
conda activate qiime2-2019.10
  1. Import demultiplexed FASTQs as QIIME2 artifact
mkdir reads_qza
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest \
  --output-path reads_qza/paired_end.qza \
  --input-format PairedEndFastqManifestPhred33
  1. Remove amplicon primers (去除V3V4 primer區段,其他區段的話,要去查,或是跟廠商要) / view quality distribution
time qiime cutadapt trim-paired \
--i-demultiplexed-sequences reads_qza/paired_end.qza \
--p-cores 8 \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC  \
--p-discard-untrimmed \
--o-trimmed-sequences reads_qza/primer-trimmed.qza \
--verbose \
&> primer_trimming.log 

qiime demux summarize \
--i-data reads_qza/primer-trimmed.qza \
--o-visualization reads_qza/primer-trimmed.qzv

p-cores為運算核心數,視電腦的配備調整

  1. Running DADA2
time qiime dada2 denoise-paired --i-demultiplexed-seqs reads_qza/primer-trimmed.qza \
                           --p-trunc-len-f 260 \
                           --p-trunc-len-r 200 \
                           --p-n-threads 8 \
                           --output-dir dada2_output

使用DADA2 denoise法
根據R1跟R2 fastq的quality scores,將R1裁成260bp,R2裁成200bp,注意裁切後的長度要使R1、R2合併時,要有20bp以上的overlapping
p-max-ee為容許錯誤率,預設是2
p-n-threads為運算核心數,視電腦的配備調整

  1. Convert stats QZA to TSV
qiime tools export --input-path dada2_output/denoising_stats.qza --output-path dada2_output

輸出denoise完的報表,可以檢視各樣本filtering, denoise, merge, remove chimera後的read counts

  1. Filter out rare ASVs
qiime feature-table filter-features --i-table dada2_output/table.qza \
                                    --p-min-frequency 1 \
                                    --p-min-samples 1 \
                                    --o-filtered-table dada2_output/table_filt.qza
                                    
qiime feature-table filter-seqs --i-data dada2_output/representative_sequences.qza \
                                --i-table dada2_output/table_filt.qza \
                                --o-filtered-data dada2_output/rep_seqs_filt.qza
                                
qiime feature-table summarize --i-table dada2_output/table_filt.qza --o-visualization dada2_output/table_filt_summary.qzv

過濾rare ASVs,丟掉read count少於p-min-frequency和出現樣本少於p-min-samples的項目
這兩個參數都寫1的話,就是沒有過濾(我後來都跳過這步驟,以保留最多資訊,由後續的統計工具處理)

  1. Assign taxonomy
qiime feature-classifier classify-sklearn \
    --i-reads dada2_output/rep_seqs_filt.qza \
      --i-classifier ~/metagenome_tool/QIIME2_taxa_classifiers/silva-132-99-nb-classifier.qza \
     --p-n-jobs 12 \
     --output-dir taxa  

qiime tools export --input-path taxa/classification.qza --output-path taxa

QIIME2官網推薦使用Naïve Bayesian assign taxonomy方法,不過該方法搭配SILVA 16S全長資料庫會非常耗費記憶體,需要20GB的配置
VSEARCH方法也是QIIME2裡搭配的taxonomy assignment tools之一,相對所需的硬體需求比較少,跑起來也更省時

Ps. Naïve Bayesian搭配SILVA 132或是Greengenes 13_5的16S rRNA gene 全長或是V4 database可以從QIIME2官網上下載到對應QIIME2版本的classifiers。 若是要用別的資料庫或是改用VSEARCH的話,則需要自行建立序列與物種的artifect檔(qza) 後面會做更詳細的指令。

time qiime feature-classifier classify-consensus-vsearch \
    --i-query dada2_output/representative_sequences.qza \
    --i-reference-reads /home/pyliu/Genomics/QIIME2_taxa_classifiers/Silva99_16SRepSeq.qza  \
    --i-reference-taxonomy /home/pyliu/Genomics/QIIME2_taxa_classifiers/Silva99_16STaxonomy.qza \
    --p-threads 8 \
    --verbose \
    --output-dir taxa
  1. Identifying differentially abundant features with ANCOM
time qiime taxa collapse \
  --i-table dada2_output/table_filt.qza \
  --i-taxonomy taxa/classification.qza  \
  --p-level 6 \
  --o-collapsed-table dada2_output/table-l6.qza

qiime composition add-pseudocount --i-table dada2_output/table-l6.qza \
                                  --o-composition-table dada2_output/table_filt_pseudocount.qza
                                  
qiime composition ancom --i-table dada2_output/table_filt_pseudocount.qza \
                        --m-metadata-file q2_metadata.1.txt  \
                        --m-metadata-column groups \
                        --output-dir ancom_output

qiime tools export --input-path ancom_output/visualization.qzv --output-path ancom_output

ANCOM也是官網推崇的組間比較、挑出顯著差異的分析方法
但組間差異不明顯的樣本,幾乎無法分析(所以又跳過了~)

  1. Exporting the final abundance and sequence files
qiime tools export --input-path dada2_output/table_filt.qza --output-path dada2_output_exported
qiime tools export --input-path dada2_output/rep_seqs_filt.qza --output-path dada2_output_exported

sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv
biom add-metadata -i dada2_output_exported/feature-table.biom -o dada2_output_exported/feature-table_w_tax.biom --observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy
biom convert -i dada2_output_exported/feature-table_w_tax.biom -o dada2_output_exported/feature-table_w_tax.txt --to-tsv --header-key taxonomy

將qza格式的qiime2 artifact轉回常用的biom或是txt格式,並且跟ASV table / OTU table合併taxonomy資訊
若不合併,則去除--observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy--header-key taxonomy

  1. 使用自建資料庫,以人類口腔微生物資料庫HOMD為例:
# classifier HOMD
time qiime tools import \
 --type 'FeatureData[Sequence]' \
--input-path HOMD_16S_rRNA_RefSeq_V15.2.p9.fasta \
 --output-path HOMD_RefSeq_V15.2.p9.qza

#如需要裁切database的序列長度
#qiime feature-classifier extract-reads \
#  --i-sequences Silva99_16SRepSeq.qza \
#  --p-f-primer CCTACGGGNGGCWGCAG \
#  --p-r-primer GACTACHVGGGTATCTAATCC \
#  --p-trunc-len 120 \  --p-min-length 100 \  --p-max-length 400 \
#  --o-reads Silva99_16SRepSeq_341-805.qza

time qiime tools import \
 --type 'FeatureData[Taxonomy]' \
 --input-path HOMD_16S_rRNA_RefSeq_V15.2.qiime.1.taxonomy \
 --input-format HeaderlessTSVTaxonomyFormat \
 --output-path HOMD_RefSeq_V15.2.taxonomy.qza
 
 
qiime feature-classifier fit-classifier-naive-bayes \
 --i-reference-reads HOMD_RefSeq_V15.2.p9.qza \
 --i-reference-taxonomy HOMD_RefSeq_V15.2.taxonomy.qza \
 --o-classifier HOMD_RefSeq_V15.2.classifier.qza

SILVA 128 版database原始資料下載
SILVA 128 版database壓縮檔下載

  1. 若是用Naïve Bayesian法的話,可以用下列QIIME2指令
qiime feature-classifier classify-sklearn --i-reads dada2_output/rep_seqs_filt.qza \
                                          --i-classifier HOMD_DB/HOMD_RefSeq_V15.2.classifier.qza \
                                          --p-n-jobs 12 \
                                          --output-dir NB_taxa_HOMD
  1. Phylogenetic tree and Unifrac (optional)
#Tree
mkdir tree_out
qiime alignment mafft --i-sequences dada2_output/representative_sequences.qza \
                      --p-n-threads 8 \
                      --o-alignment tree_out/rep_seqs_aligned.qza

qiime alignment mask --i-alignment tree_out/rep_seqs_aligned.qza \
  --o-masked-alignment tree_out/rep_seqs_aligned_masked.qza

qiime phylogeny fasttree --i-alignment tree_out/rep_seqs_aligned_masked.qza \
                         --p-n-threads 8 \
                         --o-tree tree_out/rep_seqs_aligned_masked_tree

qiime phylogeny midpoint-root --i-tree tree_out/rep_seqs_aligned_masked_tree.qza \
                              --o-rooted-tree tree_out/rep_seqs_aligned_masked_tree_rooted.qza
#Unifrac
qiime diversity beta-phylogenetic --i-table dada2_output/table.qza \
                                  --i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
                                  --p-metric 'unweighted_unifrac' \
                                  --p-n-jobs 8 \
                                  --output-dir beta_phylo/unweighted_unifrac \
                                  --verbose
qiime diversity beta-phylogenetic --i-table dada2_output/table.qza \
                                  --i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
                                  --p-metric 'weighted_unifrac' \
                                  --p-n-jobs 8 \
                                  --output-dir beta_phylo/weighted_unifrac \
                                  --verbose
  1. Rarefaction curve
qiime diversity alpha-rarefaction \
   --i-table dada2_output/deblur_table_final.qza \
   --p-max-depth X \
   --p-steps 20 \
   --o-visualization rarefaction_curves.qzv

X: the maximum read count of the dataset, see denoising_stats.qza / stats.tsv

  1. De novo clustering [97% similarity] after denoising (optional)
qiime vsearch cluster-features-de-novo \
            --i-sequences dada2_output/representative_sequences.qza \
            --i-table dada2_output/table.qza \
            --p-perc-identity 0.97 \
            --p-threads 6 \
            --output-dir cluster_97

qiime tools export --input-path cluster_97/clustered_table.qza --output-path cluster_97_exported
qiime tools export --input-path cluster_97/clustered_sequences.qza --output-path cluster_97_exported
time qiime feature-classifier classify-consensus-vsearch \
    --i-query cluster_97/clustered_sequences.qza \
    --i-reference-reads ~/Genomics/QIIME2_taxa_classifiers/Silva99_16SRepSeq.qza \
    --i-reference-taxonomy ~/Genomics/QIIME2_taxa_classifiers/Silva99_16STaxonomy.qza \
    --p-threads 6 --p-maxaccepts 1  --p-perc-identity 0.8 \
    --verbose \
    --output-dir taxa_cluster_97  
  
qiime tools export --input-path taxa_cluster_97/classification.qza --output-path taxa_cluster_97

sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa_cluster_97/taxonomy.tsv
biom add-metadata -i cluster_97_exported/feature-table.biom -o cluster_97_exported/feature-table_w_tax.biom --observation-metadata-fp taxa_cluster_97/taxonomy.tsv --sc-separated taxonomy
biom convert -i cluster_97_exported/feature-table_w_tax.biom -o cluster_97_exported/feature-table_w_tax.txt --to-tsv --header-key taxonomy
  1. Merge feature(ASV) table from 2 different batches
#MERGE DATA
qiime feature-table merge \
--i-tables dada2_output_VM/table.qza dada2_output_VM/turkey.table.qza \
 --o-merged-table dada2_output_VM/merge_table.qza

qiime feature-table merge-taxa \
--i-data taxa_VM/classification.qza taxa_VM/turkey.classification.qza \
 --o-merged-data taxa_VM/merge.classification.qza

qiime feature-table merge-seqs \
--i-data dada2_output_VM/representative_sequences.qza dada2_output_VM/turkey.representative_sequences.qza \
 --o-merged-data dada2_output_VM/merge.representative_sequences.qza

qiime tools export --input-path taxa_VM/merge.classification.qza --output-path taxa_VM/merge.taxonomy

qiime tools export --input-path dada2_output_VM/merge_table.qza --output-path dada2_output_VM_exported/merge.feature-table
qiime tools export --input-path dada2_output_VM/merge.representative_sequences.qza --output-path dada2_output_VM_exported/merge.dna-sequences

sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa_VM/merge.taxonomy/taxonomy.tsv
biom add-metadata -i dada2_output_VM_exported/merge.feature-table/feature-table.biom -o dada2_output_VM_exported/merge.feature-table/merge.feature-table_w_tax.biom --observation-metadata-fp taxa_VM/merge.taxonomy/taxonomy.tsv --sc-separated taxonomy
biom convert -i dada2_output_VM_exported/merge.feature-table/merge.feature-table_w_tax.biom -o dada2_output_VM_exported/merge.feature-table_w_tax.txt --to-tsv --header-key taxonomy
  1. PICRUSt2
#### PICRUSt2
conda activate picrust2
mkdir picrust2
cd picrust2
place_seqs.py -s ../dada2_output_exported/dna-sequences.fasta -o asv.tre -p 6            --intermediate asv_intermediate/place_seqs --verbose

hsp.py -i 16S -t asv.tre -o asv_marker_predicted_and_nsti.tsv.gz -p 6 -n

hsp.py -i KO -t asv.tre -o KO_predicted_asv.tsv.gz -p 6

time metagenome_pipeline.py -i ../dada2_output_exported/feature-table_w_tax.biom             -m asv_marker_predicted_and_nsti.tsv.gz -f KO_predicted_asv.tsv.gz -o KO_metagenome_out_asv --strat_out

convert_table.py KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -c contrib_to_legacy -o KO_metagenome_out_asv/pred_metagenome_contrib.legacy.tsv.gz

time pathway_pipeline.py -i KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out_asv --no_regroup  --map /home/pyliu/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv -p 6

time pathway_pipeline.py -i KO_metagenome_out_asv/pred_metagenome_contrib.tsv.gz -o KEGG_modules_out_asv --no_regroup --map /home/pyliu/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/pathway_mapfiles/KEGG_modules_to_KO.tsv -p 6

add_descriptions.py -i KO_metagenome_out_asv/pred_metagenome_unstrat.tsv.gz -m KO -o KO_metagenome_out_asv/pred_metagenome_unstrat_descrip.tsv.gz

Resources

QIIME 2 user documentation QIIME2 官方網站
PICRUSt2 Metagenome功能預測工具
Microbiome Helper 16S/shotgun metagenome 分析建議pipeline
MicrobiomeAnalyst 16S microbiome profiling統計及視覺化工具
GUSTA ME 微生物生態學相關統計教學
MARco Microbiome R語言分析作圖懶人包