Analysis:Radish dCAPS

From RadishDB

Jump to: navigation, search

Contents

Planning

  • Dworkin and Shim have worked out a pipeline.

SNP alignment file processing

  • Get the SNP position with the frequency of 2nd most abundant nucleotide >= 3 and get flanking 10bp
python ~/project/radish/_script/script_6.1_parse_align.py align_snp.all 3 10

dCAPS finding tools

Potential Tools

Testing

SNP2CAPS

  • Get relevant files: code, example, and rebase.
wget http://pgrc.ipk-gatersleben.de/snp2caps/download/SNP2CAPS.pl
wget http://pgrc.ipk-gatersleben.de/snp2caps/download/example
wget http://rebase.neb.com/rebase/link_gcg (doesn't work, need to download directly)
  • This tool needs enzyme specified, not optimal.
  • The downloaded perl script has some problem. It is modified and worked now.

dCAPS Finder

seq1=ATGATAAGAGGCGGA&mutseq=ATGATAAGAGGCGGA&nmismatch=0
  • Too slow.

Continue on SNP2CAPS

  • The perl script does not work. Some minor modification.
  • Work on:
    1. A batch SNP2CAPS run script and a parser for the output
    2. A batch EMBOSS:restrict run script and an output parser

SNP2CAPS script dev

  • WD: ~/project/radish/6_snp_primer/2_batch_snp2caps
  • Run example
./SNP2CAPS_mod.pl example gcg.805 BfaI,ClaI,BtgI,EarI,HaeII,NsiI,Cac8I,BsiEI,DdeI,AluI,NlaIII,Tsp45I > example.out
./SNP2CAPS_mod.pl test.fa gcg.805 Cac8I > test.out
  • Split the align file
python ~/project/radish/_script/script_6.2_split_align.py
  • Run batch SNP2CAPS script
python ~/project/radish/_script/script_6.2_snp2caps_run.py align_snp_all.flank.1
...

EMBOSS restrict

  • WD: ~/project/radish/6_snp_primer/3_restrict
  • Test run restrict
~/bin/EMBOSS-4.1.0/emboss/restrict -solofragment -sequence test.fa -sitelen 4 -outfile test.restrict -rformat2 gff -enzymes BfaI,ClaI,BtgI,EarI,HaeII,NsiI,Cac8I,BsiEI,DdeI,AluI,NlaIII,Tsp45I
  • Make sure no N and X are in the sequence file
python ~/project/radish/_script/script_6.3_check_N_X.py align_snp_all.seqs
Seq with N or X: 0
  • Test run batch restrict script
python ../../_script/script_6.3_call_restrict.py test.flank2.caps align_snp_all.seqs
  • Note that the script:
    1. Use the coordinate based on the alignment, not individual sequences
    2. Output the 5' splice junction coordinate only
    3. Do not take into account gap positions in the coordinates. So gaps are filled with X

Enzyme set II

  • Provided by David Tack.
  • Target enzyme: BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI
  • Run SNP2CAPS
pwd = /home/shiu/project/radish/6_snp_primer/2_batch_snp2caps
# Split alignment file into 8
python ~/project/radish/_script/script_6.2_split_align.py
# Convert alignment to fasta and batch run SNP2CAPS
python ~/project/radish/_script/script_6.2_snp2caps_run.py align_snp_all.flank.1 BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI
...
  • Run Emboss:restrict
pwd = ~/project/radish/6_snp_primer/3_restrict
python ../../_script/script_6.3_call_restrict.py align_snp_all.flank.4.caps align_snp_all.seqs BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI
...
  • Get sequences
pwd = ~/project/radish/6_snp_primer/3_restrict/2_enz_set2
python ~/project/radish/_script/script_6.3_get_contig_names.py all.flank_rsites
python ~/project/radish/_script/script_6.3_select_contig_seq.py all.flank_rsites.qualified ../align_snp_all.seqs