SPANR Splicing-based Analysis of Variants
SPANR addresses a key unmet challenge in genomics research, which is to ascertain how single nucleotide variations (SNVs) cause splicing misregulation and may lead to disease. This tool can analyze synonymous, missense and nonsense exonic SNVs, as well as intronic SNVs that are up to 300nt from splice junctions.
Try SPIDEX
Because of computational constraints, this tool is limited to analyzing a maximum of 40 variants at a time. If you wish to analyze more variants such as a genome-wide analysis, we also offer a pre-computed index of splicing variants called SPIDEX, downloadable from ANNOVAR. Please note that we have made some approximations in order to efficiently compute the genome-wide index SPIDEX, and it resulted in some small differences between SPANR and SPIDEX, but we believe that SPIDEX is good enough for general-purpose usage.
If you use this tool in research, please cite
Hui Y. Xiong, Babak Alipanahi, Leo J. Lee, Hannes Bretschneider,
Daniele Merico, Ryan K.C. Yuen, Yimin Hua, Serge Gueroussov,
Hamed S. Najafabadi, Timothy R. Hughes, Quaid Morris, Yoseph
Barash, Adrian R. Krainer, Nebojsa Jojic, Stephen W. Scherer,
Benjamin J. Blencowe, Brendan J. Frey*,
The
human splicing code reveals new insights into the genetic
determinants of disease.
Science DOI:
10.1126/science.1254806. Published Online December 18
2014.
How the tool works
For an SNV input using the box below, the tool first examines RefSeq transcripts to locate exons on the reference genome (hg19) whose inclusion levels may be affected. For each exon, the tool extracts 1393 features from proximal DNA sequence and uses a computational model to predict the percent of transcripts with the exon spliced in (PSI) for each of 16 human tissues, using both the wildtype (reference genome) and mutated sequences. The tool reports the maximum mutation-induced change in PSI across 16 tissues, how this value compares to those for common SNPs in the form of percentiles, and the predicted average wildtype PSI in the 16 tissues.
This tool was designed for cassette alternative splicing. It may or may not work for other types.
Context dependence
To account for the dependence of an exon’s splicing regulation on cis-context, the tool examines DNA sequences from the exon, its flanking introns, and its adjacent exons. In this way, the tool can make predictions that are more accurate than simple ESE/ESS analysis, and that may differ from minigene reporter assays, which usually use highly truncated introns and omit adjacent exons.
Runtime
The splicing code is computationally expensive and you may have to wait until your results are available.
We cache the results of variants that we compute, so if your task consists entirely of SNVs that we have already computed, it will complete instantaneously. If some of the variants that you input haven't been computed yet, we will need to run the splicing code for them. Once your job is running, a single SNV will take ~4 minutes, whereas 10 SNVs will take ~5 minutes, so it is best to submit batches of SNVs.
Input format
There are two ways of entering SNVs. You may enter one SNV at a time into the five smaller form fields above the larger text box, followed by clicking the + button, which adds it to the field below. This automatically correctly formats the input for you. You can also enter or copy & paste VCF data directly into the large text area.
You may enter between one and forty SNVs using the variant call format (VCF) described at vcftools.sourceforge.net/specs.html. Briefly, each SNV is on a different line and each line contains five TAB delimited entries:
- the chromosome
CHROM
, - the position of the SNV
POS
(1-offset), - a short user-selected name
ID
(<20 characters, no white space or special symbols), - the reference allele
REF
(A/C/G/T), - and the mutated allele
ALT
(A/C/G/T).
Both REF
and ALT
are given with regard to the forward (+) strand.
The ID
may be .
(a dot), in which case the web tool will use VAR_N,
where N is the line number. REF
must match the hg19 reference genome.
Always ensure that your input is TAB separated. When you copy & paste from other sources, TAB characters are sometimes converted to spaces.
Examples
To try out the tool, you may wish to copy and paste some of the following examples, which include intronic, missense and synonymous mutations, and mutations found in patients with spinal muscular atrophy, Lynch syndrome and autism spectrum disorders (ASDs). Note that the columns are TAB separated.
#CHROM POS ID REF ALT 17 19566814 HGMD_Intronic T G 11 103116103 HGMD_Sense_1 G T 17 17125815 HGMD_Sense_2 C T 5 70247773 SMN1_Synonymous C T 5 70247921 SMN1_Intronic A G 8 43172532 dbSNP_Missense C G 11 112101362 HGMD_Missense C A 1 206666584 dbSNP_Intronic A G 3 37042444 Lynch_MLH1_Intronic A G 3 37090087 Lynch_MLH1_Exonic G A 2 47639550 Lynch_MSH2_Intronic T G 2 47698180 Lynch_MSH2_Exonic G T 12 1984413 ASD_1 T C 19 9009305 ASD_2 G A 8 23060238 ASD_3 C T 15 75983008 ASD_4 G C
Below are additional mutation examples for Lynch syndrome, spinal muscular atrophy and cystic fibrosis.
Each ID indicates whether or not the mutation altered splicing in a minigene reporter (POS or NEG, and can be compared to the splicing code prediction.
Lynch syndrome mutations (nonpolyposis colorectal cancer)
# POS: Increased skipping observed # NEG: No significant skipping observed # The transcripts used in RT-PCR studies are: # MLH1: NM_000249 # MSH2: NM_000251 # CHR PSO ID REF ALT 3 37038108 MLH1_01086_POS A T 3 37038114 MLH1_00967_NEG G C 3 37038201 MLH1_01254_POS G T 3 37042468 MLH1_00148_NEG G A 3 37042549 MLH1_00175_POS G A 3 37045967 MLH1_01299_POS T C 3 37050315 MLH1_00240_NEG T G 3 37053560 MLH1_01332_NEG T G 3 37053590 MLH1_00284_POS G T 3 37058990 MLH1_01512_POS T A 3 37061919 MLH1_01057_NEG C T 3 37067465 MLH1_01100_NEG C T 3 37070437 MLH1_00540_NEG G A 3 37081674 MLH1_01140_POS C G 3 37083822 MLH1_00598_POS G A 3 37089133 MLH1_00919_NEG G C 3 37090028 MLH1_01224_POS A G 3 37090087 MLH1_00685_POS G T 3 37090432 MLH1_00849_NEG T C 3 37090471 MLH1_00749_NEG A G 2 47635695 MSH2_01210_POS G T 2 47639700 MSH2_00224_POS G A 2 47641406 MSH2_01279_POS A C 2 47641560 MSH2_00260_POS A T 2 47657082 MSH2_00911_POS T A 2 47693802 MSH2_00430_POS G T 2 47693888 MSH2_00988_NEG T A 2 47693952 MSH2_00455_POS G C 2 47698108 MSH2_00482_NEG T C 2 47698123 MSH2_00470_NEG G A 2 47702310 MSH2_00508_NEG G C 2 47702316 MSH2_01056_NEG A G 2 47702337 MSH2_01058_NEG C G 2 47702410 MSH2_01068_POS G T 2 47703587 MSH2_01101_NEG C T 2 47705442 MSH2_01138_NEG G T 2 47705445 MSH2_00631_NEG G A 2 47707893 MSH2_01171_NEG T A 2 47708015 MSH2_01181_POS G C 2 47708015 MSH2_01182_POS G T
Spinal muscular atrophy mutations
# Up: increased PSI observed # Down: decreased PSI observed 5 69372386 SMN2_exon7_39G_Up T G 5 69372386 SMN2_exon7_39A_Up T A 5 69372387 SMN2_exon7_40A_Up T A 5 69372387 SMN2_exon7_40G_Up T G 5 69372387 SMN2_exon7_40C_Up T C 5 69372388 SMN2_exon7_41A_Down C A 5 69372388 SMN2_exon7_41U_Down C T 5 69372388 SMN2_exon7_41G_Up C G 5 69372389 SMN2_exon7_42A_Up C A 5 69372389 SMN2_exon7_42U_Down C T 5 69372389 SMN2_exon7_42G_Up C G 5 69372390 SMN2_exon7_43A_Up T A 5 69372390 SMN2_exon7_43C_Up T C 5 69372390 SMN2_exon7_43G_Up T G 5 69372391 SMN2_exon7_44G_Up T G 5 69372391 SMN2_exon7_44C_Up T C 5 69372391 SMN2_exon7_44A_Up T A 5 69372392 SMN2_exon7_45U_Down A T 5 69372392 SMN2_exon7_45G_Up A G 5 69372392 SMN2_exon7_45C_Up A C 5 69372396 SMN2_exon7_49A_Up T A 5 69372396 SMN2_exon7_49C_Up T C 5 69372396 SMN2_exon7_49G_Up T G 5 69372397 SMN2_exon7_50C_Up A C 5 69372397 SMN2_exon7_50G_Up A G 5 69372397 SMN2_exon7_50U_Down A T 5 69372398 SMN2_exon7_51U_Up A T 5 69372398 SMN2_exon7_51G_Down A G 5 69372398 SMN2_exon7_51C_Up A C 5 69372399 SMN2_exon7_52U_Up G T 5 69372399 SMN2_exon7_52A_Up G A 5 69372399 SMN2_exon7_52C_Up G C 5 69372400 SMN2_exon7_53C_Up G C 5 69372401 SMN2_exon7_54C_Up A C 5 69372401 SMN2_exon7_54G_Up A G
Cystic Fibrosis Transmembrane Regulator mutations
# Up: increased PSI observed # Down: decreased PSI observed 7 117230419 CFTR_exon12_13G_Up A G 7 117230422 CFTR_exon12_16C_Up T C 7 117230425 CFTR_exon12_19A_Down T A 7 117230425 CFTR_exon12_19G_Up T G 7 117230425 CFTR_exon12_19C_Up T C 7 117230428 CFTR_exon12_22C_Up T C 7 117230431 CFTR_exon12_25A_Down G A 7 117230434 CFTR_exon12_28C_Up T C 7 117230440 CFTR_exon12_34G_Up A G 7 117230443 CFTR_exon12_37T_Up C T 7 117230446 CFTR_exon12_40C_Down T C 7 117230446 CFTR_exon12_40A_Up T A 7 117230446 CFTR_exon12_40G_Up T G 7 117230449 CFTR_exon12_43A_Up T A 7 117230449 CFTR_exon12_43G_Up T G 7 117230449 CFTR_exon12_43C_Up T C 7 117230455 CFTR_exon12_49G_Down A G 7 117230455 CFTR_exon12_49T_Down A T 7 117230458 CFTR_exon12_52T_Down C T
Result interpretation
We received many emails requesting a guideline to interpret the results, especially a threshold to determine when a variant is predicted to disrupt splicing. Although we have tried several cutoffs in our orginal paper and elsewhere with some success, such as |dSPI| >= 5 when performing genome-wide analysis of variants and the bottom 2nd and 3rd percentiles as cutoffs when performing the ASD analysis or |z| >= 2 when using SPIDEX, we have to acknowledge that this can be highly problem dependent, and will require further research and improvements on our end too. Therefore, we encourage users of SPANR/SPIDEX to explore based on their applications, besides taking our experience as a reference.