Reputation: 59
The code is getting messy somewhere in the loop! Please help me to solve it.
Replicate and/or reassign most of the array elements in the multi-dimensional array, using reference elements.
File-1: List of array indices & the elements that needs to be maintained in the original array.
File-2: The original multi-dimensional array that needs to be rewritten with the above info. Except the elements from the above, the rest of all elements have to be reassigned.
File-3: Expected output (reassigned array elements)
Note: Addition to the array indices from file1, rest of all the indices will be replaced with the reference line. Reference line is usually present in the first line of the array.
In the modified array, the reference line is not needed.
File-1:
ID1 2 E1,E4
ID2 5 E6,E7,E9
ID3 1 E3
File-2:
ID1.txt
Ref K L M N O P A B C D
E1 S H G U S K R E K K
E2 S L G N O P A B C D
E3 S L G N O P A B C D
E4 U L G G O P A B C D
E5 U L M G O P A J C D
E6 U L M G O P A J C D
E7 U L M G O P A J C D
E8 U L M G O P A J C D
E9 S L M N O P A J C D
E10 S L M N O P A J C D
.
.
.
File-3: Expected output
new_ID1.txt
E1 K L G N O P A B C D
E2 K L M N O P A B C D
E3 K L M N O P A B C D
E4 K L G N O P A B C D
E5 K L M N O P A B C D
E6 K L M N O P A B C D
E7 K L M N O P A B C D
E8 K L M N O P A B C D
E9 K L M N O P A B C D
E10 K L M N O P A B C D
.
.
.
In the expected output, (new_ID1.txt), second index of the array for "E1" and "E4" is maintained from the original array. Everything else is replaced by the reference line in "E2,E3,E5...".
#!/usr/bin/perl
use strict;
use warnings;
my %HoHoA = ();
open(IN,"ids.txt");
my @ids = <IN>; chomp @ids; close IN;
open(IN2,"indices_and_values.txt");
while(my $l = <IN2>)
{
chomp $l;
my @tmp = split "\t", $l;
my $lid = $tmp[0];
my $pos = $tmp[1];
my @gps = @tmp[2..$#tmp];
foreach my $g (@gps)
{
push @{$HoHoA{$lid}{$g}}, $pos;
}
}
close IN2;
foreach my $outer (sort keys %HoHoA)
{
open(IN3,"$outer.txt");
my @rS = <IN3>; chomp @rS; close IN3;
my @orgArr = (); my @refArr = (); my @newArr = ();
foreach my $unk (@rS)
{
@orgArr = split "\t", $unk;
if($unk =~ /^Ref/)
{
@refArr = split "\t", $unk;
next;
}
foreach my $inner (sort keys %{$HoHoA{$outer}})
{
if($inner =~ /^$orgArr[0]/)
{
foreach my $ele (sort {$a <=> $b} @{$HoHoA{$outer}{$inner}})
{
$refArr[$ele] = $orgArr[$ele];
}
}
#else
#{
#}
}
print ">$orgArr[0]\t";
print join("\t",@refArr[1..$#refArr]);
print "\n";
}
@rS = ();
print "\n";
}
Upvotes: 3
Views: 109
Reputation: 66881
The shown code is well-meant but a bit too complicated; you may have lost your way in the maneuvers over the nested data structure. Here's another, simpler, approach.
Parse the information from the "reference" file (File-1
) into a hash (E1 => [2, ...], ..)
. I put indices for data to be kept in an arrayref to allow for multiple indices for a row. Then go line by line, replacing data at these indices for rows that have a key, and print output as you go.
use warnings;
use strict;
use feature 'say';
my ($ref_file, $data_file) = @ARGV;
die "Usage: $0 ref-file data-file\n" if not $ref_file or not $data_file;
open my $fh, '<', $ref_file or die "Can't open $ref_file: $!";
my %rows;
while (<$fh>) {
my (undef, $idx, $row_id) = split;
for (split /,/, $row_id) {
push @{$rows{$_}}, $idx; # elem => [ indices ]
}
}
my $outfile = 'new_' . $data_file;
open $fh, '<', $data_file or die "Can't open $data_file: $!";
open my $fh_out, '>', $outfile or die "Can't open $outfile: $!";
my @ref = split ' ', <$fh>;
shift @ref; # toss the first field
while (<$fh>) {
my ($row_id, @data) = split;
if (exists $rows{$row_id}) { # this row needs attention
my @new_row = @ref;
foreach my $idx (@{$rows{$row_id}}) { # keep data at these indices
$new_row[$idx] = $data[$idx];
}
say $fh_out join "\t", $row_id, @new_row;
}
else { # use whole reference line
say $fh_out join "\t", $row_id, @ref;
}
}
The new file (shown with two spaces instead of the actual tabs, for readability)
E1 K L G N O P A B C D E2 K L M N O P A B C D E3 K L M N O P A B C D E4 K L G N O P A B C D E5 K L M N O P A B C D E6 K L M N O P A B C D E7 K L M N O P A B C D E8 K L M N O P A B C D E9 K L M N O P A B C D E10 K L M N O P A B C D
Note that the given input file happens to have the same entries as the reference line to use in replacement at many indices of interest -- so we can't see those "changes" in the above output. (I tested by changing the input file so to be able to see.)
Upvotes: 2
Reputation: 163
This is one way to do it, if I understood your problem statement correctly:
#!/usr/bin/perl
use strict;
use warnings;
my %keep_idx;
open FILE, "file-1" or die "Couldn't open file-1";
while(<FILE>) {
my (undef, $idx, $id_str) = split /\s+/;
my @ids = split /,/, $id_str;
foreach my $id (@ids) {
$keep_idx{$id}{$idx} = 1;
}
}
close FILE;
open FILE, "file-2" or die "Couldn't open file-2";
open OUTFILE, ">file-3" or die "Couldn't open file-3";
my (undef, @ref) = split /\s+/, <FILE>;
while(<FILE>) {
my ($id, @src) = split /\s+/;
my $line = "$id";
for (my $i = 0; $i <= $#src; $i++) {
my $e = $keep_idx{$id}{$i} ? $src[$i] : $ref[$i];
$line .= " $e";
}
print OUTFILE "$line\n";
}
close OUTFILE;
close FILE;
Upvotes: 1