CentOS编译安装samtools 1.11
Samtools是一个用于操作sam和bam文件的工具软件,能够对比对文件进行二进制查看、格式转换、排序及合并等,结合sam格式中的flag、tag等信息,还可以完成比对结果的统计汇总,是处理sam和bam文件(例如:转录组Tophat分析软件输出的比对结果为.bam文件,而重测序中BWA、bowtie等比对软件则主要输出为.sam文件)不可或缺的工具集。
官网:
https://github.com/samtools/samtools
https://github.com/samtools/samtools/releases
https://github.com/samtools/samtools/blob/develop/INSTALL
依赖:
yum install -y autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel ncurses-devel
编译:
wget https://github.com/samtools/samtools/releases/download/1.11/samtools-1.11.tar.bz2
tar jxvf samtools-1.11.tar.bz2
cd samtools-1.11
./configure
make -j 8
make install
如果用icc
则为./configure CC=icc --prefix=/opt/icc-compiled
安装:
[root@localhost samtools-1.11]# make install
mkdir -p -m 755 /usr/local/bin /usr/local/bin /usr/local/share/man/man1
install -p samtools /usr/local/bin
install -p misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim /usr/local/bin
install -p misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl mis
c/seq_cache_populate.pl misc/soap2sam.pl misc/wgsim_eval.pl misc/zoom2sam.pl misc/plot-ampliconstats /usr/local/bininstall -p -m 644 doc/samtools*.1 misc/wgsim.1 /usr/local/share/man/man1
版本:
[root@localhost samtools-1.11]# samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.11 (using htslib 1.11)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
fqidx index/extract FASTQ
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
markdup mark duplicates
ampliconclip clip oligos from the end of reads
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
-- Statistics
bedcov read depth per BED region
coverage alignment depth and percent coverage
depth compute the depth
flagstat simple stats
idxstats BAM index stats
phase phase heterozygotes
stats generate stats (former bamcheck)
ampliconstats generate amplicon specific stats
-- Viewing
flags explain BAM flags
tview text alignment viewer
view SAM<->BAM<->CRAM conversion
depad convert padded BAM to unpadded BAM
测试
E.coli K12(一种实验用的大肠杆菌)的参考序列数据:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
gzip -dc GCF_000005845.2_ASM584v2_genomic.fna.gz > E.coli_K12_MG1655.fa
用samtools创建一个索引
[root@localhost ~]# samtools faidx E.coli_K12_MG1655.fa
[root@localhost ~]# ls
E.coli_K12_MG1655.fa E.coli_K12_MG1655.fa.fai
通过索引来获取fasta文件中任意位置的序列或者任意完整的染色体序列
[root@localhost ~]# samtools faidx E.coli_K12_MG1655.fa NC_000913.3:1000000-1000200
>NC_000913.3:1000000-1000200
GTGTCAGCTTTCGTGGTGTGCAGCTGGCGTCAGATGACAACATGCTGCCAGACAGCCTGA
AAGGGTTTGCGCCTGTGGTGCGTGGTATCGCCAAAAGCAATGCCCAGATAACGATTAAGC
AAAATGGTTACACCATTTACCAAACTTATGTATCGCCTGGTGCTTTTGAAATTAGTGATC
TCTATTCCACGTCGTCGAGCG