SPANR Splicing-based Analysis of Variants

SPANR addresses a key unmet challenge in genomics research, which is to ascertain how single nucleotide variations (SNVs) cause splicing misregulation and may lead to disease. This tool can analyze synonymous, missense and nonsense exonic SNVs, as well as intronic SNVs that are up to 300nt from splice junctions.


Because of computational constraints, this tool is limited to analyzing a maximum of 40 variants at a time. If you wish to analyze more variants such as a genome-wide analysis, we also offer a pre-computed index of splicing variants called SPIDEX, downloadable from ANNOVAR. Please note that we have made some approximations in order to efficiently compute the genome-wide index SPIDEX, and it resulted in some small differences between SPANR and SPIDEX, but we believe that SPIDEX is good enough for general-purpose usage.

If you use this tool in research, please cite

Hui Y. Xiong, Babak Alipanahi, Leo J. Lee, Hannes Bretschneider, Daniele Merico, Ryan K.C. Yuen, Yimin Hua, Serge Gueroussov, Hamed S. Najafabadi, Timothy R. Hughes, Quaid Morris, Yoseph Barash, Adrian R. Krainer, Nebojsa Jojic, Stephen W. Scherer, Benjamin J. Blencowe, Brendan J. Frey*,
The human splicing code reveals new insights into the genetic determinants of disease.
Science DOI: 10.1126/science.1254806. Published Online December 18 2014.

How the tool works

For an SNV input using the box below, the tool first examines RefSeq transcripts to locate exons on the reference genome (hg19) whose inclusion levels may be affected. For each exon, the tool extracts 1393 features from proximal DNA sequence and uses a computational model to predict the percent of transcripts with the exon spliced in (PSI) for each of 16 human tissues, using both the wildtype (reference genome) and mutated sequences. The tool reports the maximum mutation-induced change in PSI across 16 tissues, how this value compares to those for common SNPs in the form of percentiles, and the predicted average wildtype PSI in the 16 tissues.

This tool was designed for cassette alternative splicing. It may or may not work for other types.

Context dependence

To account for the dependence of an exon’s splicing regulation on cis-context, the tool examines DNA sequences from the exon, its flanking introns, and its adjacent exons. In this way, the tool can make predictions that are more accurate than simple ESE/ESS analysis, and that may differ from minigene reporter assays, which usually use highly truncated introns and omit adjacent exons.


The splicing code is computationally expensive and you may have to wait until your results are available.

We cache the results of variants that we compute, so if your task consists entirely of SNVs that we have already computed, it will complete instantaneously. If some of the variants that you input haven't been computed yet, we will need to run the splicing code for them. Once your job is running, a single SNV will take ~4 minutes, whereas 10 SNVs will take ~5 minutes, so it is best to submit batches of SNVs.

Input format

There are two ways of entering SNVs. You may enter one SNV at a time into the five smaller form fields above the larger text box, followed by clicking the + button, which adds it to the field below. This automatically correctly formats the input for you. You can also enter or copy & paste VCF data directly into the large text area.

You may enter between one and forty SNVs using the variant call format (VCF) described at Briefly, each SNV is on a different line and each line contains five TAB delimited entries:

  1. the chromosome CHROM,
  2. the position of the SNV POS (1-offset),
  3. a short user-selected name ID (<20 characters, no white space or special symbols),
  4. the reference allele REF (A/C/G/T),
  5. and the mutated allele ALT (A/C/G/T).

Both REF and ALT are given with regard to the forward (+) strand. The ID may be . (a dot), in which case the web tool will use VAR_N, where N is the line number. REF must match the hg19 reference genome.

Always ensure that your input is TAB separated. When you copy & paste from other sources, TAB characters are sometimes converted to spaces.


To try out the tool, you may wish to copy and paste some of the following examples, which include intronic, missense and synonymous mutations, and mutations found in patients with spinal muscular atrophy, Lynch syndrome and autism spectrum disorders (ASDs). Note that the columns are TAB separated.

17	19566814	HGMD_Intronic	T	G
11	103116103	HGMD_Sense_1	G	T
17	17125815	HGMD_Sense_2	C	T
5	70247773	SMN1_Synonymous	C	T
5	70247921	SMN1_Intronic	A	G
8	43172532	dbSNP_Missense	C	G
11	112101362	HGMD_Missense	C	A
1	206666584	dbSNP_Intronic	A	G
3	37042444	Lynch_MLH1_Intronic	A	G
3	37090087	Lynch_MLH1_Exonic	G	A
2	47639550	Lynch_MSH2_Intronic	T	G
2	47698180	Lynch_MSH2_Exonic	G	T
12	1984413	ASD_1	T	C
19	9009305	ASD_2	G	A
8	23060238	ASD_3	C	T
15	75983008	ASD_4	G	C

Below are additional mutation examples for Lynch syndrome, spinal muscular atrophy and cystic fibrosis.

Each ID indicates whether or not the mutation altered splicing in a minigene reporter (POS or NEG, and can be compared to the splicing code prediction.

Lynch syndrome mutations (nonpolyposis colorectal cancer)
# POS: Increased skipping observed
# NEG: No significant skipping observed
# The transcripts used in RT-PCR studies are:
# MLH1: NM_000249
# MSH2: NM_000251
3	37038108	MLH1_01086_POS	A	T
3	37038114	MLH1_00967_NEG	G	C
3	37038201	MLH1_01254_POS	G	T
3	37042468	MLH1_00148_NEG	G	A
3	37042549	MLH1_00175_POS	G	A
3	37045967	MLH1_01299_POS	T	C
3	37050315	MLH1_00240_NEG	T	G
3	37053560	MLH1_01332_NEG	T	G
3	37053590	MLH1_00284_POS	G	T
3	37058990	MLH1_01512_POS	T	A
3	37061919	MLH1_01057_NEG	C	T
3	37067465	MLH1_01100_NEG	C	T
3	37070437	MLH1_00540_NEG	G	A
3	37081674	MLH1_01140_POS	C	G
3	37083822	MLH1_00598_POS	G	A
3	37089133	MLH1_00919_NEG	G	C
3	37090028	MLH1_01224_POS	A	G
3	37090087	MLH1_00685_POS	G	T
3	37090432	MLH1_00849_NEG	T	C
3	37090471	MLH1_00749_NEG	A	G
2	47635695	MSH2_01210_POS	G	T
2	47639700	MSH2_00224_POS	G	A
2	47641406	MSH2_01279_POS	A	C
2	47641560	MSH2_00260_POS	A	T
2	47657082	MSH2_00911_POS	T	A
2	47693802	MSH2_00430_POS	G	T
2	47693888	MSH2_00988_NEG	T	A
2	47693952	MSH2_00455_POS	G	C
2	47698108	MSH2_00482_NEG	T	C
2	47698123	MSH2_00470_NEG	G	A
2	47702310	MSH2_00508_NEG	G	C
2	47702316	MSH2_01056_NEG	A	G
2	47702337	MSH2_01058_NEG	C	G
2	47702410	MSH2_01068_POS	G	T
2	47703587	MSH2_01101_NEG	C	T
2	47705442	MSH2_01138_NEG	G	T
2	47705445	MSH2_00631_NEG	G	A
2	47707893	MSH2_01171_NEG	T	A
2	47708015	MSH2_01181_POS	G	C
2	47708015	MSH2_01182_POS	G	T
Spinal muscular atrophy mutations
# Up: increased PSI observed
# Down: decreased PSI observed
5	69372386	SMN2_exon7_39G_Up	T	G
5	69372386	SMN2_exon7_39A_Up	T	A
5	69372387	SMN2_exon7_40A_Up	T	A
5	69372387	SMN2_exon7_40G_Up	T	G
5	69372387	SMN2_exon7_40C_Up	T	C
5	69372388	SMN2_exon7_41A_Down	C	A
5	69372388	SMN2_exon7_41U_Down	C	T
5	69372388	SMN2_exon7_41G_Up	C	G
5	69372389	SMN2_exon7_42A_Up	C	A
5	69372389	SMN2_exon7_42U_Down	C	T
5	69372389	SMN2_exon7_42G_Up	C	G
5	69372390	SMN2_exon7_43A_Up	T	A
5	69372390	SMN2_exon7_43C_Up	T	C
5	69372390	SMN2_exon7_43G_Up	T	G
5	69372391	SMN2_exon7_44G_Up	T	G
5	69372391	SMN2_exon7_44C_Up	T	C
5	69372391	SMN2_exon7_44A_Up	T	A
5	69372392	SMN2_exon7_45U_Down	A	T
5	69372392	SMN2_exon7_45G_Up	A	G
5	69372392	SMN2_exon7_45C_Up	A	C
5	69372396	SMN2_exon7_49A_Up	T	A
5	69372396	SMN2_exon7_49C_Up	T	C
5	69372396	SMN2_exon7_49G_Up	T	G
5	69372397	SMN2_exon7_50C_Up	A	C
5	69372397	SMN2_exon7_50G_Up	A	G
5	69372397	SMN2_exon7_50U_Down	A	T
5	69372398	SMN2_exon7_51U_Up	A	T
5	69372398	SMN2_exon7_51G_Down	A	G
5	69372398	SMN2_exon7_51C_Up	A	C
5	69372399	SMN2_exon7_52U_Up	G	T
5	69372399	SMN2_exon7_52A_Up	G	A
5	69372399	SMN2_exon7_52C_Up	G	C
5	69372400	SMN2_exon7_53C_Up	G	C
5	69372401	SMN2_exon7_54C_Up	A	C
5	69372401	SMN2_exon7_54G_Up	A	G
Cystic Fibrosis Transmembrane Regulator mutations
# Up: increased PSI observed
# Down: decreased PSI observed
7	117230419	CFTR_exon12_13G_Up	A	G
7	117230422	CFTR_exon12_16C_Up	T	C
7	117230425	CFTR_exon12_19A_Down	T	A
7	117230425	CFTR_exon12_19G_Up	T	G
7	117230425	CFTR_exon12_19C_Up	T	C
7	117230428	CFTR_exon12_22C_Up	T	C
7	117230431	CFTR_exon12_25A_Down	G	A
7	117230434	CFTR_exon12_28C_Up	T	C
7	117230440	CFTR_exon12_34G_Up	A	G
7	117230443	CFTR_exon12_37T_Up	C	T
7	117230446	CFTR_exon12_40C_Down	T	C
7	117230446	CFTR_exon12_40A_Up	T	A
7	117230446	CFTR_exon12_40G_Up	T	G
7	117230449	CFTR_exon12_43A_Up	T	A
7	117230449	CFTR_exon12_43G_Up	T	G
7	117230449	CFTR_exon12_43C_Up	T	C
7	117230455	CFTR_exon12_49G_Down	A	G
7	117230455	CFTR_exon12_49T_Down	A	T
7	117230458	CFTR_exon12_52T_Down	C	T

Result interpretation

We received many emails requesting a guideline to interpret the results, especially a threshold to determine when a variant is predicted to disrupt splicing. Although we have tried several cutoffs in our orginal paper and elsewhere with some success, such as |dSPI| >= 5 when performing genome-wide analysis of variants and the bottom 2nd and 3rd percentiles as cutoffs when performing the ASD analysis or |z| >= 2 when using SPIDEX, we have to acknowledge that this can be highly problem dependent, and will require further research and improvements on our end too. Therefore, we encourage users of SPANR/SPIDEX to explore based on their applications, besides taking our experience as a reference.

SNV input

This website may be freely used for research where all findings are made publicly available. Please contact us for a commercial license.