Shtanto
Shtanto

Reputation: 39

Perl $strings to hash tables conversion

I'm working on some DNA (A, T, C and G with the chance of U thrown in)

RIght now I have a really long string full of DNA of undefined length. I've got the code for the nucleotide bases done.

%nucleotide_bases = ( A => Adenine, 
                      T => Thymine, 
                      G => Guanine, 
                  C => Cytosine );

 $nucleotide_bases{'U'} = 'This is a RNA base called Uracil';#T=U for RNA

Now all I need to do is put in some sort of loop to read each single character from the string. Since this code is for students, it needs to be simple. I started using perl myself a few weeks ago, java before that.

The string ($string1 it's called) needs to print it's full name as each single base pair is read (one at a time). So when the string says ATATCGCG

The output to the screen needs to read: Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine

If this is too tricky to do from a string, I can use an array as a starting point. Many thanks for your assitance.

Excellent answers. We'll be all set now.

The other question I had was about making sure the user could only enter DNA bases (A, T, C & G). I think this is called input validation.

print "Please enter your first DNA sequence now: \n";
$userinput1=<>;
chomp $userinput1;

How would you add input validation there? The first print statement should always be re-asked unless conditions are met.

I know I need something like

 if($userinput1 ne 'a' or 't' or 'c' or 'g') {
 print "Please enter DNA only (A, T, C or G)";
 }

I'm not totally sure how to get back to the original print statement

Upvotes: 2

Views: 1498

Answers (4)

Borodin
Borodin

Reputation: 126722

Please always use strict and use warnings at the start of all your Perl programs, especially those you are seeking help with. That way Perl will fix a lot of simple errors that you haven't noticed and you will produce working code much more quickly.

This can be done very simply by splitting the string into characters, using the hash to translate them, and then joining them up again.

This program demonstrates the idea. Note that I have left code that constructs the hash as you supplied it, simply because you may prefer it that way.

use strict;
use warnings;

my %nucleotide_bases = (
  A => 'Adenine', 
  T => 'Thymine', 
  G => 'Guanine', 
  C => 'Cytosine',
);
$nucleotide_bases{'U'} = 'This is a RNA base called Uracil'; #T=U for RNA

my $chain = 'ATATCGCG';

my $expand = join ' ', map $nucleotide_bases{$_}, split //, $chain;

print $expand, "\n";

output

Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine

Edit

As requested, this is to read a sequence from the console and repeat as long as the sequence supplied is invalid. The output is identical to that of the preceding code.

use strict;
use warnings;

my %nucleotide_bases = (
  A => 'Adenine', 
  T => 'Thymine', 
  G => 'Guanine', 
  C => 'Cytosine',
);
$nucleotide_bases{'U'} = 'This is a RNA base called Uracil'; #T=U for RNA

my $userinput1;
while () {
  print "Please enter your first DNA sequence now: ";
  chomp ($userinput1 = uc <>);
  last unless $userinput1 =~ /[^ATGC]/;
  printf qq("$userinput1" is an invalid sequence\n);
} 

my $expand = join ' ', map $nucleotide_bases{$_}, split //, $userinput1;

print $expand, "\n";

Upvotes: 0

Ωmega
Ωmega

Reputation: 43673

Script:

#!/usr/bin/perl

use strict;
use warnings;

my %nucleotide_bases = ( A => 'Adenine',
                         T => 'Thymine',
                         G => 'Guanine',
                         C => 'Cytosine',
                         U => 'Uracil' );

my $string1 = 'ATATCGCG';

$string1 =~ s/([ATGCU])/{$nucleotide_bases{$1}.' '}/ge;

print $string1, "\n";

Output:

Adenine Thymine Adenine Thymine Cytosine Guanine Cytosine Guanine 

Upvotes: 0

TLP
TLP

Reputation: 67900

I assume you're trying to decode the various letters A, T, G and C from a string and print out their full name.

print "$nucleotide_bases{$_} " for split //, $string;

Or use an array:

my @array = map $nucleotide_bases{$_}, split(//, $string);
print "@array"; # quoted to insert spaces between elements.

As an alternative to split, you can use a regex, which will exclude any non-relevant characters from being decoded:

my @array = $string =~ /[ATCG]/g;

Oh, and when you assign values to your hash, you need to quote the values. Nice catch by Luke Girvin.

my %nucleotide_bases = ( A => "Adenine", ... );

Upvotes: 3

Luke Girvin
Luke Girvin

Reputation: 13442

Using the recipe Processing a String One Character at a Time, I came up with this:

use warnings;
use strict;

my %nucleotide_bases = ( A => 'Adenine', 
             T => 'Thymine', 
             G => 'Guanine', 
             C => 'Cytosine' );

my $string = 'ATATCGCG';
my @array = split(//, $string);
foreach (@array) {
    my $char = $_;
    print $nucleotide_bases{$char}, ' ';
}

Note that I'm using use warnings and use strict (which, as a beginner, you should probably be doing too), so I had to add quotes around the base names. Also, the program prints out an extra space at the end.

Upvotes: 3

Related Questions