Reputation: 39
I'm working on some DNA (A, T, C and G with the chance of U thrown in)
RIght now I have a really long string full of DNA of undefined length. I've got the code for the nucleotide bases done.
%nucleotide_bases = ( A => Adenine,
T => Thymine,
G => Guanine,
C => Cytosine );
$nucleotide_bases{'U'} = 'This is a RNA base called Uracil';#T=U for RNA
Now all I need to do is put in some sort of loop to read each single character from the string. Since this code is for students, it needs to be simple. I started using perl myself a few weeks ago, java before that.
The string ($string1 it's called) needs to print it's full name as each single base pair is read (one at a time). So when the string says ATATCGCG
The output to the screen needs to read: Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine
If this is too tricky to do from a string, I can use an array as a starting point. Many thanks for your assitance.
Excellent answers. We'll be all set now.
The other question I had was about making sure the user could only enter DNA bases (A, T, C & G). I think this is called input validation.
print "Please enter your first DNA sequence now: \n";
$userinput1=<>;
chomp $userinput1;
How would you add input validation there? The first print statement should always be re-asked unless conditions are met.
I know I need something like
if($userinput1 ne 'a' or 't' or 'c' or 'g') {
print "Please enter DNA only (A, T, C or G)";
}
I'm not totally sure how to get back to the original print statement
Upvotes: 2
Views: 1498
Reputation: 126722
Please always use strict
and use warnings
at the start of all your Perl programs, especially those you are seeking help with. That way Perl will fix a lot of simple errors that you haven't noticed and you will produce working code much more quickly.
This can be done very simply by splitting the string into characters, using the hash to translate them, and then joining them up again.
This program demonstrates the idea. Note that I have left code that constructs the hash as you supplied it, simply because you may prefer it that way.
use strict;
use warnings;
my %nucleotide_bases = (
A => 'Adenine',
T => 'Thymine',
G => 'Guanine',
C => 'Cytosine',
);
$nucleotide_bases{'U'} = 'This is a RNA base called Uracil'; #T=U for RNA
my $chain = 'ATATCGCG';
my $expand = join ' ', map $nucleotide_bases{$_}, split //, $chain;
print $expand, "\n";
output
Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine
Edit
As requested, this is to read a sequence from the console and repeat as long as the sequence supplied is invalid. The output is identical to that of the preceding code.
use strict;
use warnings;
my %nucleotide_bases = (
A => 'Adenine',
T => 'Thymine',
G => 'Guanine',
C => 'Cytosine',
);
$nucleotide_bases{'U'} = 'This is a RNA base called Uracil'; #T=U for RNA
my $userinput1;
while () {
print "Please enter your first DNA sequence now: ";
chomp ($userinput1 = uc <>);
last unless $userinput1 =~ /[^ATGC]/;
printf qq("$userinput1" is an invalid sequence\n);
}
my $expand = join ' ', map $nucleotide_bases{$_}, split //, $userinput1;
print $expand, "\n";
Upvotes: 0
Reputation: 43673
Script:
#!/usr/bin/perl
use strict;
use warnings;
my %nucleotide_bases = ( A => 'Adenine',
T => 'Thymine',
G => 'Guanine',
C => 'Cytosine',
U => 'Uracil' );
my $string1 = 'ATATCGCG';
$string1 =~ s/([ATGCU])/{$nucleotide_bases{$1}.' '}/ge;
print $string1, "\n";
Output:
Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine
Upvotes: 0
Reputation: 67900
I assume you're trying to decode the various letters A, T, G and C from a string and print out their full name.
print "$nucleotide_bases{$_} " for split //, $string;
Or use an array:
my @array = map $nucleotide_bases{$_}, split(//, $string);
print "@array"; # quoted to insert spaces between elements.
As an alternative to split
, you can use a regex, which will exclude any non-relevant characters from being decoded:
my @array = $string =~ /[ATCG]/g;
Oh, and when you assign values to your hash, you need to quote the values. Nice catch by Luke Girvin.
my %nucleotide_bases = ( A => "Adenine", ... );
Upvotes: 3
Reputation: 13442
Using the recipe Processing a String One Character at a Time, I came up with this:
use warnings; use strict; my %nucleotide_bases = ( A => 'Adenine', T => 'Thymine', G => 'Guanine', C => 'Cytosine' ); my $string = 'ATATCGCG'; my @array = split(//, $string); foreach (@array) { my $char = $_; print $nucleotide_bases{$char}, ' '; }
Note that I'm using use warnings
and use strict
(which, as a beginner, you should probably be doing too), so I had to add quotes around the base names. Also, the program prints out an extra space at the end.
Upvotes: 3