Reputation: 285
still having trouble with perl programming and I need to be pushed to make a script work out. I have two files and I want to use the list file to "extract" rows from the data one. The problem is that the list file is formatted as follow:
X1 A B
X2 C D
X3 E F
And my data looks like this:
A X1 2 5
B X1 3 7
C X2 1 4
D X2 1 5
I need to obtain the element pairs from the list file by which select the row in the data file. At the same time I would like to write an output like this:
X1 A B 2 5 3 7
X2 C D 1 4 1 5
I'm trying writing a perl code, but I'm not able to produce something useful. I'm at this point:
open (LIST, "< $fils_list") || die "impossibile open the list";
@list = <LIST>;
close (LIST);
open (HAN, "< $data") || die "Impossible open data";
@r = <HAN>;
close (HAN);
for ($p=0; $p<=$#list; $p++){
chomp ($list[$p]);
($x, $id1, $id2) = split (/\t/, $list[$p]);
$pair_one = $id1."\t".$x;
$pair_two = $id2."\t".$x;
for ($i=0; $i<=$#r; $i++){
chomp ($r[$i]);
($a, $b, $value1, $value2) = split (/\t/, $r[$i]);
$bench = $a."\t".$b;
if (($pair_one eq $bench) || ($pair_two eq $bench)){
print "I don't know what does this script must print!\n";
}
}
}
I'm not able to rationalize about what to print. Any kind of suggestion is very welcome!
Upvotes: 0
Views: 112
Reputation: 1256
This is how i would do it, mostly using Hashes resp. Hash- and Array-References (test1.txt and test2.txt contain the data you provided in your example):
use strict;
use warnings;
open(my $f1, '<','test1.txt') or die "Cannot open file1: $!\n";
open(my $f2, '<','test2.txt') or die "Cannot open file2: $!\n";
my @data1 = <$f1>;
my @data2 = <$f2>;
close($f1);
close($f2);
chomp @data1;
chomp @data2;
my %result;
foreach my $line1 (@data1) {
my @fields1 = split(' ',$line1);
$result{$fields1[0]}->{$fields1[1]} = [];
$result{$fields1[0]}->{$fields1[2]} = [];
}
foreach my $line2 (@data2){
my @fields2 = split(' ',$line2);
push @{$result{$fields2[1]}->{$fields2[0]}}, $fields2[2];
push @{$result{$fields2[1]}->{$fields2[0]}}, $fields2[3];
}
foreach my $res (sort keys %result){
foreach (sort keys %{$result{$res}}){
print $res . " " . $_ . " " . join (" ", sort @{$result{$res}->{$_}}) . "\n";
}
}
Upvotes: 1
Reputation: 57650
A few general recommendations:
$a
or $value1
(if I do so below, this is due to my lack of domain knowledge).use strict; use warnings;
.use autodie
for automatic error handling.Also, use the open
function like open my $fh, "<", $filename
as this is safer.
Remember what I said about data structures? In the second file, you have entries like
A X1 2 5
This looks like a secondary key, a primary key, and some data columns. Key-value relationships are best expressed through a hash table.
use strict; use warnings; use autodie;
use feature 'say'; # available since 5.010
open my $data_fh, "<", $data;
my %data;
while (<$data_fh>) {
chomp; # remove newlines
my ($id2, $id1, @data) = split /\t/;
$data{$id1}{$id2} = \@data;
}
Now %data
is a nested hash which we can use for easy lookups:
open my $list_fh, "<", $fils_list;
LINE: while(<$list_fh>) {
chomp;
my ($id1, @id2s) = split /\t/;
my $data_id1 = $data{$id1};
defined $data_id1 or next LINE; # maybe there isn't anything here. Then skip
my @values = map @{ $data_id1->{$_} }, @id2s; # map the 2nd level ids to their values and flatten the list
# now print everything out:
say join "\t", $id1, @id2s, @values;
}
The map
function is a bit like a foreach loop, and builds a list of values. We need the @{ ... }
here because the data structure doesn't hold arrays, but references to arrays. The @{ ... }
is a dereference operator.
Upvotes: 2