How to replicate and/or reassign array elements in a multi-dimensional array?

Question

The code is getting messy somewhere in the loop! Please help me to solve it.

Details

Replicate and/or reassign most of the array elements in the multi-dimensional array, using reference elements.

File-1: List of array indices & the elements that needs to be maintained in the original array.
File-2: The original multi-dimensional array that needs to be rewritten with the above info. Except the elements from the above, the rest of all elements have to be reassigned.
File-3: Expected output (reassigned array elements)

Note: Addition to the array indices from file1, rest of all the indices will be replaced with the reference line. Reference line is usually present in the first line of the array.
In the modified array, the reference line is not needed.

File-1:

ID1    2    E1,E4
ID2    5    E6,E7,E9
ID3    1    E3

File-2:

ID1.txt

Ref K L M N O P A B C D
E1 S H G U S K R E K K
E2 S L G N O P A B C D
E3 S L G N O P A B C D
E4 U L G G O P A B C D
E5 U L M G O P A J C D
E6 U L M G O P A J C D
E7 U L M G O P A J C D
E8 U L M G O P A J C D
E9 S L M N O P A J C D
E10 S L M N O P A J C D
.
.
.

File-3: Expected output

new_ID1.txt

E1    K L G N O P A B C D
E2    K L M N O P A B C D
E3    K L M N O P A B C D
E4    K L G N O P A B C D
E5    K L M N O P A B C D
E6    K L M N O P A B C D
E7    K L M N O P A B C D
E8    K L M N O P A B C D
E9    K L M N O P A B C D
E10    K L M N O P A B C D
.
.
.

In the expected output, (new_ID1.txt), second index of the array for "E1" and "E4" is maintained from the original array. Everything else is replaced by the reference line in "E2,E3,E5...".

Code

#!/usr/bin/perl 

use strict;
use warnings;

my %HoHoA = ();

open(IN,"ids.txt");
my @ids = ; chomp @ids; close IN;

open(IN2,"indices_and_values.txt");

while(my $l = )
{
    chomp $l;
my @tmp = split "	", $l;
my $lid = $tmp[0];
my $pos = $tmp[1];
my @gps = @tmp[2..$#tmp];

    foreach my $g (@gps)
    {
        push @{$HoHoA{$lid}{$g}}, $pos;
    }
}
close IN2;


foreach my $outer (sort keys %HoHoA)
{
open(IN3,"$outer.txt");
my @rS = ; chomp @rS; close IN3;

    my @orgArr = (); my @refArr = (); my @newArr = ();
    foreach my $unk (@rS) 
    { 
        @orgArr = split "	", $unk;
        if($unk =~ /^Ref/)
        { 
            @refArr = split "	", $unk;
            next;
        }
    foreach my $inner (sort keys %{$HoHoA{$outer}})
    {
        if($inner =~ /^$orgArr[0]/)
        {
            foreach my $ele (sort {$a <=> $b} @{$HoHoA{$outer}{$inner}})
            {
                $refArr[$ele] = $orgArr[$ele];
            }
        }
        #else
        #{
        #}
    }
    print ">$orgArr[0]	";
    print join("	",@refArr[1..$#refArr]);
    print "
";
}
    @rS = ();
    print "
";

}

zdim · Accepted Answer

The shown code is well-meant but a bit too complicated; you may have lost your way in the maneuvers over the nested data structure. Here's another, simpler, approach.

Parse the information from the "reference" file (File-1) into a hash (E1 => [2, ...], ..). I put indices for data to be kept in an arrayref to allow for multiple indices for a row. Then go line by line, replacing data at these indices for rows that have a key, and print output as you go.

use warnings;
use strict;
use feature 'say';

my ($ref_file, $data_file) = @ARGV;
die "Usage: $0 ref-file data-file
" if not $ref_file or not $data_file;

open my $fh, '<', $ref_file or die "Can't open $ref_file: $!";
my %rows;
while (<$fh>) {
    my (undef, $idx, $row_id) = split;
    for (split /,/, $row_id) {
        push @{$rows{$_}}, $idx;        # elem => [ indices ]
    }
}

my $outfile = 'new_' . $data_file;
open    $fh,     '<', $data_file  or die "Can't open $data_file: $!";
open my $fh_out, '>', $outfile    or die "Can't open $outfile: $!";

my @ref = split ' ', <$fh>;
shift @ref;                  # toss the first field

while (<$fh>) {
    my ($row_id, @data) = split;

    if (exists $rows{$row_id}) {              # this row needs attention
        my @new_row = @ref;
        foreach my $idx (@{$rows{$row_id}}) { # keep data at these indices
            $new_row[$idx] = $data[$idx];
        }
        say $fh_out join "	", $row_id, @new_row;
    }
    else {                                    # use whole reference line
        say $fh_out join "	", $row_id, @ref;
    }
}

The new file (shown with two spaces instead of the actual tabs, for readability)

E1  K  L  G  N  O  P  A  B  C  D
E2  K  L  M  N  O  P  A  B  C  D
E3  K  L  M  N  O  P  A  B  C  D
E4  K  L  G  N  O  P  A  B  C  D
E5  K  L  M  N  O  P  A  B  C  D
E6  K  L  M  N  O  P  A  B  C  D
E7  K  L  M  N  O  P  A  B  C  D
E8  K  L  M  N  O  P  A  B  C  D
E9  K  L  M  N  O  P  A  B  C  D
E10  K  L  M  N  O  P  A  B  C  D

Note that the given input file happens to have the same entries as the reference line to use in replacement at many indices of interest -- so we can't see those "changes" in the above output. (I tested by changing the input file so to be able to see.)

How to replicate and/or reassign array elements in a multi-dimensional array?

Details

Code

Answers (2)

Related Questions