Samtools是一个用于操作sam和bam文件的工具软件,能够对比对文件进行二进制查看、格式转换、排序及合并等,结合sam格式中的flag、tag等信息,还可以完成比对结果的统计汇总,是处理sam和bam文件(例如:转录组Tophat分析软件输出的比对结果为.bam文件,而重测序中BWA、bowtie等比对软件则主要输出为.sam文件)不可或缺的工具集。
官网:

https://github.com/samtools/samtools
https://github.com/samtools/samtools/releases
https://github.com/samtools/samtools/blob/develop/INSTALL

依赖:

yum install  -y autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel ncurses-devel

编译:

wget https://github.com/samtools/samtools/releases/download/1.11/samtools-1.11.tar.bz2
tar jxvf samtools-1.11.tar.bz2
cd samtools-1.11
./configure
 make -j 8
make install

如果用icc则为./configure CC=icc --prefix=/opt/icc-compiled
安装:

[root@localhost samtools-1.11]# make install
mkdir -p -m 755 /usr/local/bin /usr/local/bin /usr/local/share/man/man1
install -p samtools /usr/local/bin
install -p misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim /usr/local/bin
install -p misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl mis
c/seq_cache_populate.pl misc/soap2sam.pl misc/wgsim_eval.pl misc/zoom2sam.pl misc/plot-ampliconstats /usr/local/bininstall -p -m 644 doc/samtools*.1 misc/wgsim.1 /usr/local/share/man/man1

版本:

[root@localhost samtools-1.11]# samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.11 (using htslib 1.11)

Usage:   samtools <command> [options]

Commands:
  -- Indexing
     dict           create a sequence dictionary file
     faidx          index/extract FASTA
     fqidx          index/extract FASTQ
     index          index alignment

  -- Editing
     calmd          recalculate MD/NM tags and '=' bases
     fixmate        fix mate information
     reheader       replace BAM header
     targetcut      cut fosmid regions (for fosmid pool only)
     addreplacerg   adds or replaces RG tags
     markdup        mark duplicates
     ampliconclip   clip oligos from the end of reads

  -- File operations
     collate        shuffle and group alignments by name
     cat            concatenate BAMs
     merge          merge sorted alignments
     mpileup        multi-way pileup
     sort           sort alignment file
     split          splits a file by read group
     quickcheck     quickly check if SAM/BAM/CRAM file appears intact
     fastq          converts a BAM to a FASTQ
     fasta          converts a BAM to a FASTA

  -- Statistics
     bedcov         read depth per BED region
     coverage       alignment depth and percent coverage
     depth          compute the depth
     flagstat       simple stats
     idxstats       BAM index stats
     phase          phase heterozygotes
     stats          generate stats (former bamcheck)
     ampliconstats  generate amplicon specific stats

  -- Viewing
     flags          explain BAM flags
     tview          text alignment viewer
     view           SAM<->BAM<->CRAM conversion
     depad          convert padded BAM to unpadded BAM

测试

E.coli K12(一种实验用的大肠杆菌)的参考序列数据:

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz  
gzip -dc GCF_000005845.2_ASM584v2_genomic.fna.gz > E.coli_K12_MG1655.fa

用samtools创建一个索引

[root@localhost ~]# samtools faidx E.coli_K12_MG1655.fa
[root@localhost ~]# ls
E.coli_K12_MG1655.fa  E.coli_K12_MG1655.fa.fai

通过索引来获取fasta文件中任意位置的序列或者任意完整的染色体序列

[root@localhost ~]# samtools faidx E.coli_K12_MG1655.fa NC_000913.3:1000000-1000200
>NC_000913.3:1000000-1000200
GTGTCAGCTTTCGTGGTGTGCAGCTGGCGTCAGATGACAACATGCTGCCAGACAGCCTGA
AAGGGTTTGCGCCTGTGGTGCGTGGTATCGCCAAAAGCAATGCCCAGATAACGATTAAGC
AAAATGGTTACACCATTTACCAAACTTATGTATCGCCTGGTGCTTTTGAAATTAGTGATC
TCTATTCCACGTCGTCGAGCG

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

Captcha Code