CDNA sequence/analysis

From Radish
Jump to: navigation, search


Linkage map




Analysis Notes

Novel coding sequence

  • The un-annotated regions in eukaryote genomes are found to be expressed in several tiling array studies. In addition, it is generally more difficult to predict small protein genes and many have been identified through experiments but not computational predictions.
  • As a proof of concept, we have identified >900 small open reading frames that are highly likely to be real genes in the intergenic regions of the Arabidopsis thaliana genome. The work is done mainly through the support of the Radish transcriptome sequencing grant and has been published in Genome Research.

EST assembly

The contigs you can download in the are generated by the JCVI plant TA pipline. Assembling were taking place from each library and in following ways:

  1. Use seqclean to clean all the est sequences which include discarding sequences shorter that 100bp, seraching internal UniVec database and screening out any vector sequences, using DUST to get rid of low complexity sequences.
  2. Use the TGICL pipeline to generate contigs with the following parameters:
    • Minimum percent identity for overlaps (PID):95
    • Miminum overlap length: 50
    • Maximum length of unmatched overhangs:20

Marker identification