Reputation: 1949
So I have a file "Myoutput test.txt" of the form
#Some comments
#some more comments
A X word123_0988b 0.00132 -123.4 567
T E word123_0988b 0.00456 -231.4 897
H D word123_0988b 1.3132 -120.2 757
F Y word234_09876b 0.1231 -12344 789
A T word234_09876b 0.34531 -144 789
F Y word234_09876b 0.1231 -12344 789
G L word890_0987a 0.00012 -12312 654
And I want to build a list of the form
{{word123_0988b,A,T,H},{word234_09876b,F,A,F},{word890_0987a,G}}
where the first position of each sublist is the identifier in the 3rd column, and the other letters are all of the letters in the first column which this identifier is associated with.
To do this I was thinking about doing this:
However, I can't even do the 1st point. Here's where I got until now:
#!/usr/local/bin/perl
use strict;
use warnings;
my $dir='D:\test';
my ($out,$file);
open $out,"<", "$dir\\Myoutput test.txt" or die "problem opening out $!";
my @file = grep (!/^#/,<$out>); #ignores commented lines
while ($file =~ /(\w*word\w*)/g){
print "$1\n"; #would print all words matching "word"
}
close $out;
Could someone give me some tips or any guidance on how to do this? Thank you so much!
Upvotes: 1
Views: 486
Reputation: 6204
When you:
my @file = grep (!/^#/,<$out>);
you're forcing the creation of a complete list of the file's lines, just to skip those which begin with #
. Typically, this is handled in a while
loop, so only one line at a time is read from the file, and skipped if not wanted.
The data structure that would help here is a hash of arrays (HoA), where the keys are the identifiers and the values are references to lists of column 1 letters. Here's how this can be done:
use strict;
use warnings;
my %hash;
local $" = ',';
while (<DATA>) {
next if /^#/;
my @cols = split ' ', $_, 4;
push @{ $hash{ $cols[2] } }, $cols[0];
}
print '{';
print "{$_,@{ $hash{$_} }}" for sort keys %hash;
print '}';
__END__
#Some comments
#some more comments
A X word123_0988b 0.00132 -123.4 567
T E word123_0988b 0.00456 -231.4 897
H D word123_0988b 1.3132 -120.2 757
F Y word234_09876b 0.1231 -12344 789
A T word234_09876b 0.34531 -144 789
F Y word234_09876b 0.1231 -12344 789
G L word890_0987a 0.00012 -12312 654
Output:
{{word123_0988b,A,T,H}{word234_09876b,F,A,F}{word890_0987a,G}}
The local $" = ',';
notation makes ,
print
between array elements when the array is interpolated (print
ed within a string). Each line is split
setting split
's LIMIT to 4, since only the first three columns are significant (split
ting terminates after the third column). The push
line creates the HoA. Finally, the HoA is print
ed.
Hope this helps!
Upvotes: 3
Reputation: 5139
The problem is that you aren't iterating through your array @file
. You declared $file
when you declared $out
, so that's why you don't get any errors. You'll want to cycle through the array using a for
loop instead. Try something like this:
#!/usr/local/bin/perl
use strict;
use warnings;
my $out;
open $out,"<", "test.txt" or die "problem opening out $!";
my @file = grep (!/^#/,<$out>); #ignores commented lines
for my $file (@file) {
if ( $file =~ /(\w*word\w*)/g) {
print "$1\n"; #would print all words matching "word"
}
}
close $out;
I changed the open statement so you'll have to change it back to your input file. Hopefully this gets you past being stuck on the first point. The output looks like:
matt@mattpc:~/Documents/test/4$ perl test.pl
word123_0988b
word123_0988b
word123_0988b
word234_09876b
word234_09876b
word234_09876b
word890_0987a
Upvotes: 3