Analysis:Radish dCAPS
From RadishDB
Contents |
[edit]
Planning
- Dworkin and Shim have worked out a pipeline.
[edit]
SNP alignment file processing
- Get the SNP position with the frequency of 2nd most abundant nucleotide >= 3 and get flanking 10bp
python ~/project/radish/_script/script_6.1_parse_align.py align_snp.all 3 10
[edit]
dCAPS finding tools
[edit]
Potential Tools
[edit]
Testing
[edit]
SNP2CAPS
- Get relevant files: code, example, and rebase.
wget http://pgrc.ipk-gatersleben.de/snp2caps/download/SNP2CAPS.pl wget http://pgrc.ipk-gatersleben.de/snp2caps/download/example wget http://rebase.neb.com/rebase/link_gcg (doesn't work, need to download directly)
- This tool needs enzyme specified, not optimal.
- The downloaded perl script has some problem. It is modified and worked now.
[edit]
dCAPS Finder
- Web-based, considering writing a robot.
- Find out how to get the cgi form fields: Html form to Email
- Original
- File modified: change the action field to 'mailto:you@yourdomain.com'
- Form result
- Find out how to get the cgi form fields: Html form to Email
seq1=ATGATAAGAGGCGGA&mutseq=ATGATAAGAGGCGGA&nmismatch=0
- Too slow.
[edit]
Continue on SNP2CAPS
- The perl script does not work. Some minor modification.
- Work on:
- A batch SNP2CAPS run script and a parser for the output
- A batch EMBOSS:restrict run script and an output parser
[edit]
SNP2CAPS script dev
- WD: ~/project/radish/6_snp_primer/2_batch_snp2caps
- Run example
./SNP2CAPS_mod.pl example gcg.805 BfaI,ClaI,BtgI,EarI,HaeII,NsiI,Cac8I,BsiEI,DdeI,AluI,NlaIII,Tsp45I > example.out ./SNP2CAPS_mod.pl test.fa gcg.805 Cac8I > test.out
- Split the align file
python ~/project/radish/_script/script_6.2_split_align.py
- Run batch SNP2CAPS script
python ~/project/radish/_script/script_6.2_snp2caps_run.py align_snp_all.flank.1 ...
[edit]
EMBOSS restrict
- WD: ~/project/radish/6_snp_primer/3_restrict
- Test run restrict
~/bin/EMBOSS-4.1.0/emboss/restrict -solofragment -sequence test.fa -sitelen 4 -outfile test.restrict -rformat2 gff -enzymes BfaI,ClaI,BtgI,EarI,HaeII,NsiI,Cac8I,BsiEI,DdeI,AluI,NlaIII,Tsp45I
- Make sure no N and X are in the sequence file
python ~/project/radish/_script/script_6.3_check_N_X.py align_snp_all.seqs Seq with N or X: 0
- Test run batch restrict script
python ../../_script/script_6.3_call_restrict.py test.flank2.caps align_snp_all.seqs
- Note that the script:
- Use the coordinate based on the alignment, not individual sequences
- Output the 5' splice junction coordinate only
- Do not take into account gap positions in the coordinates. So gaps are filled with X
[edit]
Enzyme set II
- Provided by David Tack.
- Target enzyme: BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI
- Run SNP2CAPS
pwd = /home/shiu/project/radish/6_snp_primer/2_batch_snp2caps # Split alignment file into 8 python ~/project/radish/_script/script_6.2_split_align.py # Convert alignment to fasta and batch run SNP2CAPS python ~/project/radish/_script/script_6.2_snp2caps_run.py align_snp_all.flank.1 BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI ...
- Run Emboss:restrict
pwd = ~/project/radish/6_snp_primer/3_restrict python ../../_script/script_6.3_call_restrict.py align_snp_all.flank.4.caps align_snp_all.seqs BclI,EcoRI,EcoRV,HindIII,KpnI,NdeI,PstI ...
- Get sequences
pwd = ~/project/radish/6_snp_primer/3_restrict/2_enz_set2 python ~/project/radish/_script/script_6.3_get_contig_names.py all.flank_rsites python ~/project/radish/_script/script_6.3_select_contig_seq.py all.flank_rsites.qualified ../align_snp_all.seqs
