用selscan进行全基因组正选择扫描

selscan是一款非常实用的全基因组正选择扫描软件,接受经过phasing处理的vcf作为输入文件,可以按指定大小的滑动窗口计算EHH、iHS和nSL等等正选择指标。

按位点逐条scaffold计算

1
selscan --nsl --vcf /path/to/file.vcf --out /path/to/outname

用sliding window进行上述结果的标准化

1
norm --nsl --files outname.nsl.out --bp-win --winsize 50000

通过shell循环一次性计算多个scaffold的nSL

1
2
3
4
5
6
7
8
9
10
#!/bin/bash

filelist=`ls /path/to/list`
for file in $filelist
do
echo PID:$$
echo ========== selscan for $file start at : `date` ==========
selscan --nsl --vcf $file --out $file
echo ========== selscan for $file end at : `date` ==========
done

官网上给出的详细参数和输出结果说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
To normalize selscan output across frequency bins:

./norm [--ihs|--xpehh|--nsl|--xpnsl|--ihh12] --files <file1.*.out> ... <fileN.*.out>

To normalize selscan output and analyze non-overlapping windows of fixed bp for
extreme scores:

./norm [--ihs|--xpehh|--nsl|--xpnsl|--ihh12] --files <file1.*.out> ... <fileN.*.out> --bp-win

----------Command Line Arguments----------

--bins <int>: The number of frequency bins in [0,1] for score normalization.
Default: 100

--bp-win <bool>: If set, will use windows of a constant bp size with varying
number of SNPs.
Default: false

--crit-percent <double>: Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP.
Not used by default.
Default: -1.00

--crit-val <double>: Set the critical value such that a SNP with |iHS| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al.
Default: 2.00

--files <string1> ... <stringN>: A list of files delimited by whitespace for
joint normalization.
Expected format for iHS or nSL files (no header):
<locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL2> <ihs/nsl>
Expected format for XP-EHH files (one line header):
<locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>
Expected format for iHH12 files (one line header):
<locus name> <physical pos> <freq1> <ihh12>
Default: infile

--first <bool>: Output only the first file's normalization.
Default: false

--help <bool>: Prints this help dialog.
Default: false

--ihh12 <bool>: Do ihh12 normalization.
Default: false

--ihs <bool>: Do iHS normalization.
Default: false

--log <string>: The log file name.
Default: logfile

--min-snps <int>: Only consider a bp window if it has at least this many SNPs.
Default: 10

--nsl <bool>: Do nSL normalization.
Default: false

--qbins <int>: Outlying windows are binned by number of sites within each
window. This is the number of quantile bins to use.
Default: 10

--winsize <int>: The non-overlapping window size for calculating the percentage
of extreme SNPs.
Default: 100000

--xpehh <bool>: Do XP-EHH normalization.
Default: false

--xpnsl <bool>: Do XP-nSL normalization.
Default: false

norm v1.3.0 - Now supports –xpnsl flag, which is identical to using –xpehh.
–qbins now has a default value of 10 instead of 20.
–bp-win analyses have been changed when analyzing XP-EHH and XP-nSL scores. Since positive scores suggest adaptation in the first (non-ref) population and negative scores suggest adaptation in the second (ref) population, we split windows into those enriched for extreme positive scores and those enriched for extreme negative scores.
min and max scores are given for each window for XP statistics, and the max |score| is reported for iHS and nSL stats.

*.windows output files therefore have additional columns:

For XP stats:

1
<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

For iHS and nSL:

1
<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

官网上关于XP-nSL输出结果的解释

https://github.com/szpiech/selscan/issues/68

selscan官网:https://github.com/szpiech/selscan
原始文献:https://doi.org/10.1093/molbev/msu211