Reputation: 1

Finding RNAs and information in a region

I want to find novel and known RNAs and transcripts in a sequence of about 10 KB. What is the most easiest way using bioinformatics tools to start with if that sequence is not well annotated in ensembl and UCSC browsers? Does splices ESTs and RNA sequencing data one option? I am new to bioinformatics, your suggestions are useful for me.

Thanks in advance

Upvotes: 0

Answers (2)

kanne

Reputation: 63

Do you have a linux server or computer or are you relying on web and windows-based programs?

To align RNA-seq reads, people generally use splice read aligners like Tophat, although BLAST would probably work too.

Initially I wrote long response explaining how to do this in Linux but I've just realised that Galaxy might be a much easier solution for a beginner. Galaxy is an online bioinformatics tool with a very user friendly interface; it's particularly designed for beginners. You can sign up and log in at this website: https://main.g2.bx.psu.edu/

There are tutorials on how to do things (see 'Help' menu) but my basic workflow for your experiment would go something like this:

Log into Galaxy
Upload RNA-seq reads, EST reads and 10K genome sequence
In the menu on the left, click to expand "NGS-RNA sequencing", then click "Tophat for Illumina (assuming your RNA-seq reads are Illumina fastq reads)"
Align your RNA-seq reads using Tophat, make sure to select your 10K sequence as the reference genome.
Try aligning your EST reads with one of the programs. I'm not sure how successful this will be, Tophat isn't designed to work with long sequences so you might have to have a bit of a play or be a bit creative to get this working.
Use Cufflinks to create annotation for novel gene models, based on your RNA-seq reads and/or EST sequences.

Regarding viewing the output, I'm not sure what is available for a custom reference sequence on Windows, you might have to do a bit of research. For Linux/Mac, I'd recommend IGV.

Upvotes: 0

Wes Field

Reputation: 3441

I am a bit unclear on what exactly your desired end-product or output would look like. But I might suggest doing multiple sequence alignments and looking for those with high scores. Chances are if this 10KB sequence will have some of those known sequences but they won't match exactly, so I think you want a program that gives you alignment scores and not just simple matches. I use Perl in combination with Clustal to make alignments. Basically, you will need to make .fasta or .aln files with both the 10KB sequence and a known sequence of interest according to those file formats' respective convention. You can use the GUI version of clustal if you are not too programming savvy. If you want to use Perl, here is a script I wrote for aligning a whole directory of .fasta files. It can perform many alignments in one fell swoop. NOTE: you must edit the clustal executable path in the last line (system call) to match its location on your computer for this script to function.

#!/usr/bin/perl 


use warnings;

print "Please type the list file name of protein fasta files to align (end the directory    path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;

opendir (DIR,$directory) or die $!;

my @file = readdir DIR;
closedir DIR;

my $add="_align.fasta";

foreach $file (@file) {
 my $infile = "$directory$file";
 (my $fileprefix = $infile) =~ s/\.[^.]+$//;
 my $outfile="$fileprefix$add";
 system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA";
}

Upvotes: 1

Finding RNAs and information in a region

Answers (2)

Related Questions