Differences

This shows you the differences between two versions of the page.

--- bioinfo:plasmidprofiler [2018/10/17 08:48] – [Running Plasmid Profiler] hyjeong
+++ bioinfo:plasmidprofiler [2021/03/17 13:09] (current) – external edit 127.0.0.1
@@ Line 20: / Line 20: @@
     * plasmid finder replicon database along with genes of interest (supplied but user-modifiable; 58.1 MB): **plasmidfinder_plusAMR.fasta** - 이것은 plasmid rep gene과 일부 AMR gene의 모임이다. 기본 DB에는 겨우 5 개의 AMR gene이 수록되어 있다. 서열 ID는 '(AMR)OXA181_JN20580' 형식이다.추가적으로 2145개의 AMR을 더한 것이 plasmidfinder_plusAMR2.fasta 파일이다(내가 만든 것).
   - ([[https://github.com/TGAC/KAT|KAT]]) unrepresented Gammaproteobacteria plasmid 서열을 제거하고 객 샘플에 대해서 개별적인 plasmid DB를 생성한다.
-  - ([[https://github.com/katholt/srst2|SRST2]]) 개별 plasmid DB에 read를 bowtie2로 매핑하여 putative plasmid hit을 찾는다.
+  - ([[https://github.com/katholt/srst2|SRST2]]) 개별 plasmid DB에 read를 bowtie2로 매핑하여 putative plasmid hit을 찾는다. At this stage of the pipeline, SRST2 is run using the “Custom Virulence Database” parameter with the individualized plasmid databases serving as the SRST2 database for their respective isolate.
   - (BLAST) SRST2에서 확인된 plasmid sequence로 custom BLAST DB를 만든 뒤 이를 대상으로 PlasmidFinder DB 유래 116개 plasmid replicon을 검색한다(MegaBLAST).
   - (Plasmid Profiler R 패키지) Heat map에 의한 visualization
@@ Line 28: / Line 28: @@
 적당한 문서를 찾아볼 것. Galaxy docker image는 [[https://github.com/bgruening/docker-galaxy-stable|여기]]에서, Galaxy + Plasmid Profiler 이미지는 [[https://github.com/phac-nml/plasmidprofiler-galaxy|여기]]에서 공식적으로 배포된다.
 ==== Docker Plasmid Profiler 실행 방법 ====
-루트 권한으로 다음과 같이 입력하라.
+루트 권한으로 다음과 같이 입력하라. 종료한 뒤에는 하드디스크에 아무것도 남지 않는다.
   # docker run -t -p 48888:80 phacnml/plasmidprofiler_0_1_6
@@ Line 35: / Line 35: @@
 **sftp를 통해서 대용량의 파일을 전송**하려면 다음과 같이 하여라. 웹 브라우저에서 전송 가능한 파일의 크기에는 한계가 있음에 유의할 것.
   # docker run -i -t -p 48888:80 -p 8022:22 -v /data/apps/galaxy_storage/:/export/ phacnml/plasmidprofiler_0_1_6
-sftp를 통한 파일 전송은 다음과 같이 8022번으로 접속하여 실행한다.
+sftp를 통한 파일 전송은 다음과 같이 8022번으로 접속하여 실행한다. 파일이 저장되는 위치는 /data/apps/galaxy_storage/ftp/admin@galaxy.org이다(root 접근 가능).
   $ sftp -v -oPort=8022 -o User=admin@galaxy.org localhost
 ==== Running Plasmid Profiler ====
@@ Line 43: / Line 44: @@
   - Shared Data > Data Libraries > Plasmid Profiler > Databases에서 pp_plasmid_database.fasta를 선택하여 Add to History를 실행한다. 이때 History를 새로 만든다.
   - 별도로 준비한 plasmidfinder_plusAMR2.fasta 파일을 Get Data 기능으로 업로드한다.
-  - Sequence reads를 업로드하여 dataset collection을 만든다. 데이터 파일이 많다면 sftp(8022 포트)로 전송하는 것이 좋다. Sftp로 미리 업로드한 파일은 Get Data > Upload File from your computer에서 Choose FTP file을 클릭하면 된다.  --- //[[hyjeong@kribb.re.kr|Haeyoung Jeong]] 2018/10/15 17:35// 왜 "Choose FTP file" 버튼이 안보이지?
+  - Sequence reads를 업로드하여 dataset collection을 만든다. 데이터 파일이 많거나, 파일 하나의 크기가 2GB를 넘어서 http 전송이 불가능하면 sftp(8022 포트)로 미리 전송하는 것이 좋다. sftp로 미리 업로드한 파일은 Get Data > Upload File from your computer에서 Choose FTP file을 클릭하면 된다.  --- //[[hyjeong@kribb.re.kr|Haeyoung Jeong]] 2018/10/15 17:35// 왜 "Choose FTP file" 버튼이 안보이지?
     * Type은 fastqsanger 혹은 fastqsanger.gz으로 한다. 업로드 완료된 파일은 오른쪽 History 창에 나타날 것이다.
     * 모든 fastq file을 선택하여(맨 처음 히스토리에 등록한 database file 두 개는 제외) For all selected > Build List of Dataset Pairs를 실행한다. 작업이 완료되었는지는 History 창을 보면서 확인한다. 종종 Refresh history 버튼을 클릭하라.
-    * fastq.gz을 업로드하였으면 Collection Operation > Unzip Collection을 선택하여 실행한다. 그런데 완료 여부를 알기가 어렵다. Paired fastq.gz을 업로드한 경우의 정상적인 실행 방법은 좀 더 알아봐야 한다. 그리고 워크플로우 내부에서 압축을 해체하는데 어차피 시간이 걸리니 압축을 하지 않은 원본을 ftp로 올리는 것이 더 나을지도 모른다.
+    * (여기는 좀 불확실하다. 정확한 사용법을 파악하기 전에는 압축을 해제한 fastq file을 사용하는 것을 권한다) fastq.gz을 업로드하였으면 Collection Operation > Unzip Collection을 선택하여 실행한다. 그런데 완료 여부를 알기가 어렵다. Paired fastq.gz을 업로드한 경우의 정상적인 실행 방법은 좀 더 알아봐야 한다. 그리고 워크플로우 내부에서 압축을 해체하는데 어차피 시간이 걸리니 압축을 하지 않은 원본을 ftp로 올리는 것이 더 나을지도 모른다.
     * Interleaved file은 쓰지 못한다. 왜냐하면 PlasmidProfiler workflow가 사용할 파일은 paired end fastqs임이 명시되어 있기 때문이다. Workflow를 수정하지 않는 이상 불가능하다고 생각된다.