Table of Contents

MyPro 활용하기

MyPro(a seamless pipeline for automated prokaryotic genome assembly and annotation, PubMed)는 여러가지면에서 매우 독특하다. 설치와 사용이 쉬운 생명정보 분석용 도구는 누구나 꿈꾸고 있는 바이다. 파라미터 설정과 같은 골치아픈 것들은 생각하지 않고 그저 마우스 클릭 한번에 원하는 분석을 다 해준다면 얼마나 좋을까? 이를 구현하는 방법에도 다양한 형태가 존재한다. A5-miseq처럼 command line interface의 파이프라인이 있는가 하면, Unipro UGene처럼 GUI를 완벽하게 갖춘 통합 환경도 있다. 개별적으로 설치해야 하는 요소 프로그램이 많아지다보니 아예 이를 전부 담고 있는 리눅스 환경을 ISO 이미지로 제공하기도 한다(예: CMG-Biotools). MyPro는 아예 VirtualBox나 VMware에서 구동 가능한 가상머신 파일을 제공하고 있다. A schematic diagram shows modules, corresponding software programs and functions in the MyPro pipeline. Fig. 1 from J. Microbiol. Meth. (2015) 113:72-74.

설치

MyPro.ova 파일은 여기에서 다운로드한다. 아주 간단하게 작성된 quick-start guide가 포함되어 있다. VirtualBox에서 파일→가상시스템 가져오기를 실행하면 그만이다. 소프트웨어 파이프라인과 요소 프로그램, 설명 자료는 sourceforge 사이트에서 구하면 된다.

Customization

sudo mount -t vboxsf -o uid=1000,gid=1000 Vboxshare /home/student/Vboxshare

MyPro 실행 과정

Preprocess.py 입력한 genome size 추정치를 기준으로 100x만큼 데이터를 취한다.

Preprocess.py -read1 [/path/read1.fastq] -read2 [/path/read2.fastq] -g [genomesize] -d [depth] -l [limit] -r   [reduce] -min [min length]
-g genomesize. [default:9000000]
-d Int. depth. [default:100]
-l Float. cutoff value for quality trimming. [default:0.01]
-r Int. step size for reducing target length. [default: 10] 
-min Int.  Discard reads below length.(defult 50)
-h Help.

실제 실행 과정

manager@bl8vbox[work_160122] Preprocess.py -read1 DH-1_1.fastq -read2 DH-1_2.fastq -g 5000000
Trimming_Tool.py -q DH-1_1.fastq -limit 0.010000 -min 50
1000000
2000000
3000000
4000000
5000000
It costs you 1699.721900
Number of reads:5357118
Number of reads after trim:5097200
Trimming_Tool.py -q DH-1_2.fastq -limit 0.010000 -min 50
1000000
2000000
3000000
4000000
5000000
It costs you 1465.877421
Number of reads:5357118
Number of reads after trim:3581732
GetReadInfo.py 5000000 trimmed_DH-1_1.fastq
5000000
trimmed_DH-1_1.fastq
seqs amount:5097200
seqLen Mean:144.347132
seqLen Std:16.762850
total:735.77 Mb
depth: 147.15X

GetReadInfo.py 5000000 trimmed_DH-1_2.fastq
5000000
trimmed_DH-1_2.fastq
seqs amount:3581732
seqLen Mean:135.639087
seqLen Std:25.541227
total:485.82 Mb
depth: 97.16X

Target Read Length:151
GetPairReads.py -q1 trimmed_DH-1_1.fastq -q2 trimmed_DH-1_2.fastq -l 151
833051
GetReadInfo.py 5000000 trimmed_R1.fastq
5000000
trimmed_R1.fastq
seqs amount:833051
seqLen Mean:151.000000
seqLen Std:0.000000
total:125.79 Mb
depth: 25.16X

GetReadInfo.py 5000000 trimmed_R2.fastq
5000000
trimmed_R2.fastq
seqs amount:833051
seqLen Mean:151.000000
seqLen Std:0.000000
total:125.79 Mb
depth: 25.16X

Target Read Length:141
continue
GetPairReads.py -q1 trimmed_DH-1_1.fastq -q2 trimmed_DH-1_2.fastq -l 141
2216278
GetReadInfo.py 5000000 trimmed_R1.fastq
5000000
trimmed_R1.fastq
seqs amount:2216278
seqLen Mean:141.000000
seqLen Std:0.000000
total:312.50 Mb
depth: 62.50X

GetReadInfo.py 5000000 trimmed_R2.fastq
5000000
trimmed_R2.fastq
seqs amount:2216278
seqLen Mean:141.000000
seqLen Std:0.000000
total:312.50 Mb
depth: 62.50X

Target Read Length:141
Read1 depth:62X
Read2 depth:62X
Get 50X.
head -n 7092196 trimmed_R1.fastq > 50X_R1.fastq

head -n 7092196 trimmed_R2.fastq > 50X_R2.fastq

exit

AutoRun.py AutoRun은 Assemble.py, Integrate.py 및 Annotate.py를 포함한다. project name에 특수문자 '-'를 넣으면 안된다.

AutoRun.py -h                                 [ 4:04AM]
Usage:
AutoRun.py [projectName] -read1 [/path/read1.fastq] -read2 [/path/read2.fastq] [options]
-p Str. parameters for annotate.py
-thr Int. To remove ctg with lower than thr depth.
-h Help.

manager@bl8vbox[work_160122] AutoRun.py DH1 -read1 50X_R1.fastq -read2 50X_R2.fastq -evaluate -p '--prefix DH-1 --genus Methylomonas --species UNKNOWN --strain DH-1 --addgenes --locustag DH1' 
2016-01-22,05:21
InitPipeline.py DH1
Assemble.py DH1 -read1 50X_R1.fastq -read2 50X_R2.fastq -evaluate -thr 100
VelvetOptimiser.pl --s 84 --e 126 --x 8 -f '-fastq -shortPaired -separate /home/manager/work_160122/50X_R1.fastq /home/manager/work_160122/50X_R2.fastq'
echo 'manager' | sudo -S sh -c ' echo 1 > /proc/sys/vm/drop_caches'
edena -paired /home/manager/work_160122/50X_R1.fastq /home/manager/work_160122/50X_R2.fastq
edena -e out.ovl -m 70
edena -e out.ovl -m 80
edena -e out.ovl -m 90
edena -e out.ovl -m 100
edena -e out.ovl -m 110
edena -e out.ovl -m 120
edena -e out.ovl -m 130
edena -e out.ovl -m 140
80.fasta N50:91060
100.fasta N50:91060
90.fasta N50:91060
70.fasta N50:87710
110.fasta N50:75560
120.fasta N50:34470
130.fasta N50:1140
140.fasta N50:859
cp 80.fasta edena.ctg.fa
echo 'manager' | sudo -S sh -c ' echo 1 > /proc/sys/vm/drop_caches'
abyss-pe k=99 name=abyss in='/home/manager/work_160122/50X_R1.fastq /home/manager/work_160122/50X_R2.fastq'
cp abyss-contigs.fa abyss.ctg.fa
echo 'manager' | sudo -S sh -c ' echo 1 > /proc/sys/vm/drop_caches'
SOAPdenovo-127mer all -s myconfig -K 99 -R -o output > log
echo 'manager' | sudo -S sh -c ' echo 1 > /proc/sys/vm/drop_caches'
soapdenovo is failed.
spades.py -k 21,33,55 --careful -1 /home/manager/work_160122/50X_R1.fastq -2 /home/manager/work_160122/50X_R2.fastq -o output
echo 'manager' | sudo -S sh -c ' echo 1 > /proc/sys/vm/drop_caches'

k:99 insert:376 sd:126

2bwt-builder velvet.ctg.fa
soap -a /home/manager/work_160122/50X_R1.fastq -b /home/manager/work_160122/50X_R2.fastq -D velvet.ctg.fa.index -M 4 -r 2 -o Result -2 seResult -m 124 -x 628
['Paired:      1499115 (84.55%) PE', 'Singled:     366826 (10.34%) SE']
raw.velvet.ctg.fa Alignment:94.89%
To Remove Ctgs:87
2bwt-builder abyss.ctg.fa
soap -a /home/manager/work_160122/50X_R1.fastq -b /home/manager/work_160122/50X_R2.fastq -D abyss.ctg.fa.index -M 4 -r 2 -o Result -2 seResult -m 124 -x 628
['Paired:      1529999 (86.29%) PE', 'Singled:     326046 ( 9.19%) SE']
raw.abyss.ctg.fa Alignment:95.48%
To Remove Ctgs:11
2bwt-builder edena.ctg.fa
soap -a /home/manager/work_160122/50X_R1.fastq -b /home/manager/work_160122/50X_R2.fastq -D edena.ctg.fa.index -M 4 -r 2 -o Result -2 seResult -m 124 -x 628
['Paired:      1528338 (86.20%) PE', 'Singled:     349214 ( 9.85%) SE']
raw.edena.ctg.fa Alignment:96.05%
To Remove Ctgs:46
2bwt-builder spades.ctg.fa
soap -a /home/manager/work_160122/50X_R1.fastq -b /home/manager/work_160122/50X_R2.fastq -D spades.ctg.fa.index -M 4 -r 2 -o Result -2 seResult -m 124 -x 628
['Paired:      1520648 (85.76%) PE', 'Singled:     340690 ( 9.61%) SE']
raw.spades.ctg.fa Alignment:95.37%
To Remove Ctgs:165
n50:
	abyss.ctg.fa: 85145
	spades.ctg.fa: 88429
	edena.ctg.fa: 91061
	velvet.ctg.fa: 91123
Ctgs:
	velvet.ctg.fa: 170
	spades.ctg.fa: 163
	abyss.ctg.fa: 141
	edena.ctg.fa: 120
The longest ctg's length:
	spades.ctg.fa: 258373
	velvet.ctg.fa: 258673
	abyss.ctg.fa: 261513
	edena.ctg.fa: 262421
WholeGenome:
	velvet.ctg.fa: 4965202
	edena.ctg.fa: 4995487
	spades.ctg.fa: 5015797
	abyss.ctg.fa: 5169166
Alignment %:
	raw.velvet.ctg.fa: 94.89
	raw.spades.ctg.fa: 95.37
	raw.abyss.ctg.fa: 95.48
	raw.edena.ctg.fa: 96.05
Keep all assemblies.
Integrate.py DH1 -i abyss.ctg.fa,edena.ctg.fa,velvet.ctg.fa,soap.ctg.fa,spades.ctg.fa -evaluate
CISA path:/home/manager/MyTools/Tools/CISA1.3
soap.ctg.fa does not exist in Assemble folder.
Contigs: velvet.ctg.fa,spades.ctg.fa,abyss.ctg.fa,edena.ctg.fa
Merge.py ToMerge.config
../Assemble/velvet.ctg.fa.p.fa
Number of contigs: 164
Length of the longest contig: 258673
whole:4956860
N50: 87625
../Assemble/spades.ctg.fa.p.fa
Number of contigs: 148
Length of the longest contig: 258373
whole:5008510
N50: 88429
../Assemble/abyss.ctg.fa.p.fa
Number of contigs: 139
Length of the longest contig: 261513
whole:5166754
N50: 85145
../Assemble/edena.ctg.fa.p.fa
Number of contigs: 116
Length of the longest contig: 262421
whole:4993154
N50: 91061
echo 'y' | CISA.py CISA.config

N50.py cisa.ctg.fa
2bwt-builder cisa.ctg.fa
soap -a /home/manager/work_160122/50X_R1.fastq -b /home/manager/work_160122/50X_R2.fastq -D cisa.ctg.fa.index -M 4 -r 2 -o Result -2 seResult -m 124 -x 628
['Paired:      1542144 (86.98%) PE', 'Singled:     312106 ( 8.80%) SE']


cisa.ctg.fa Alignment:95.78%
whole:5137529
N50: 100923
Number of contigs: 103
Length of the longest contig: 262421
Annotate.py DH1 -i cisa.ctg.fa -p '--prefix DH-1 --genus Methylomonas --species UNKNOWN --strain DH-1 --addgenes --locustag DH1'
cp ../Integrate/cisa.ctg.fa ./
prokka --prefix DH-1 --genus Methylomonas --species UNKNOWN --strain DH-1 --addgenes --locustag DH1 --usegenus my.ctg.fa
organism: Methylomonas unknown DH-1 
contigs: 103
bases: 5137529
tmRNA: 1
rRNA: 11
repeat_region: 4
tRNA: 52
gene: 4624
CDS: 4560
2016-01-22,13:01

Postassembly.py 가까운 reference genome이 있으면 r2cat으로 contig를 정렬(ordering)한다. 정렬된 contig 말단에 15 bp 이상의 overlap이 있으면 병합한다. 병합된 contig 위에서 SOAP2로 정렬(aligning)되기 어려운 read들은 EDENA로 local assembly를 해서 contig 연결에 사용한다.