Reputation: 3
I've been struggling with this for a while and I was wondering if there was something obvious I've missed.
As programming learning/practice, I'm trying to put together a simple script for calculating the components of a restriction enzyme digest mix. However, first I need to get a list of enzyme stock concentrations.
I pulled all the individual pages from the New England Biolabs enzyme page, and my goal with this current script is to pull out the name of the enzyme and the concentrations available from the company.
This example works with a local copy of EcoRI (link included at bottom of submission).
use warnings;
use strict;
open(FILE,'productR0101.asp');
my $line;
my $counter;
my $array1;
my $array2;
my $array3;
my $concentration;
my @array4;
$counter = 1;
while ($line = <FILE>) {
chomp($line);
if ($counter == 6 ){
$array1 = $line;
$counter++;
}
else{
$counter++;
}
if ($line =~ m/.{8}units.ml/g) {
(@array4) =$line =~ m/.{8}units.ml/g;
print @array4;
}
}
print "\n".$array1;
exit;
Every file has the enzyme name on the sixth line of the file, so I just pulled that whole line. However, the concentrations are in different locations, so my approach was to read in the file one line at a time, and match to the units/ml
tag.
My thinking was that it should print out the match for each line, if there was one, every time the while loop runs, effectively resulting in a string of separate print statements.
This is where I get messed up. There are six different locations in this file with a units/ml
tag: three for 20,000
and three for 100,000
.
I was expecting six different results printed, but when I run this, only one 100,000 units/ml
result is returned.
I've tried all sorts of fixes. I tried concatenating strings, I tried storing it as a string, I tried concatenating it onto another array that never gets touched by the (@array4) = $line =~ m/.{8}units.ml/g
line, and it either breaks it or gives the same result.
And finally, I apologize for any weird conventions. I'm still learning Perl, and my first experience programming was with MATLAB.
Also, the $array1
, $array2
, etc. exist because I was trying to keep track of exactly what was getting put where; my intention is to clean it up once I get it functional.
So does anyone have any ideas about what I'm doing wrong?
EDIT: the data source is the source code to each individual enzyme page. For this example, if you view the page source you get the complete input file I gave to the script.
Upvotes: 0
Views: 434
Reputation: 1697
I can't exactly reproduce the behavior you've reported of only getting one of the 100,000 units/ml results, as I'm not exactly sure what your input data is. However, I think the problem is with the regular expression not having any captures. You should put parenthesis around the part of the regex match that you want to be returned to @array4. So instead of this:
@array4 = $line =~ m/.{8}units.ml/g;
Try this:
@array4 = $line =~ m/(.{8})units.ml/g;
@array4 = $line =~ /(.{8})units.ml/;
EDIT: You also don't want to use the m/ and /g modifiers.
Upvotes: 0
Reputation: 336138
Are the 20,000 units/ml
at the start of the line? Because in that case, .{8}
would fail to match - the dot doesn't match newlines, and 20,000_
is only 7 characters.
Upvotes: 1
Reputation: 126722
We really need to see the data you are processing, but it looks like you are storing only the last occurrence of /units.ml/
in @array4
because you are reading the file line by line.
I will add to this answer if you supplement your question, but for now I need to know
What your data looks like
What the mysterious /.{8}/
is for
Are you aware that $array1
, $array2
, and $array3
, are scalars, as well as being very bad names for variables?
For now, here is a rewrite of your code using idiomatic Perl, and the $.
variable that evaluates to the line number of the file most recently read
use strict;
use warnings;
open my $file, '<', 'productR0101.asp' or die $!;
my $array1;
my @array4;
while (my $line = <$file>) {
chomp $line;
$array1 = $line if $. == 6;
if ($line =~ m/.{8}units.ml/) {
@array4 = $line =~ m/.{8}units.ml/g;
print "@array4\n";
}
}
print "\n".$array1;
Upvotes: 0