perlbeginner
perlbeginner

Reputation: 59

How to replicate and/or reassign array elements in a multi-dimensional array?

The code is getting messy somewhere in the loop! Please help me to solve it.

Details

Replicate and/or reassign most of the array elements in the multi-dimensional array, using reference elements.

Note: Addition to the array indices from file1, rest of all the indices will be replaced with the reference line. Reference line is usually present in the first line of the array.
In the modified array, the reference line is not needed.

File-1:

ID1    2    E1,E4
ID2    5    E6,E7,E9
ID3    1    E3

File-2:

ID1.txt

Ref K L M N O P A B C D
E1 S H G U S K R E K K
E2 S L G N O P A B C D
E3 S L G N O P A B C D
E4 U L G G O P A B C D
E5 U L M G O P A J C D
E6 U L M G O P A J C D
E7 U L M G O P A J C D
E8 U L M G O P A J C D
E9 S L M N O P A J C D
E10 S L M N O P A J C D
.
.
.

File-3: Expected output

new_ID1.txt

E1    K L G N O P A B C D
E2    K L M N O P A B C D
E3    K L M N O P A B C D
E4    K L G N O P A B C D
E5    K L M N O P A B C D
E6    K L M N O P A B C D
E7    K L M N O P A B C D
E8    K L M N O P A B C D
E9    K L M N O P A B C D
E10    K L M N O P A B C D
.
.
.

In the expected output, (new_ID1.txt), second index of the array for "E1" and "E4" is maintained from the original array. Everything else is replaced by the reference line in "E2,E3,E5...".

Code

#!/usr/bin/perl 

use strict;
use warnings;

my %HoHoA = ();

open(IN,"ids.txt");
my @ids = <IN>; chomp @ids; close IN;

open(IN2,"indices_and_values.txt");

while(my $l = <IN2>)
{
    chomp $l;
my @tmp = split "\t", $l;
my $lid = $tmp[0];
my $pos = $tmp[1];
my @gps = @tmp[2..$#tmp];

    foreach my $g (@gps)
    {
        push @{$HoHoA{$lid}{$g}}, $pos;
    }
}
close IN2;


foreach my $outer (sort keys %HoHoA)
{
open(IN3,"$outer.txt");
my @rS = <IN3>; chomp @rS; close IN3;

    my @orgArr = (); my @refArr = (); my @newArr = ();
    foreach my $unk (@rS) 
    { 
        @orgArr = split "\t", $unk;
        if($unk =~ /^Ref/)
        { 
            @refArr = split "\t", $unk;
            next;
        }
    foreach my $inner (sort keys %{$HoHoA{$outer}})
    {
        if($inner =~ /^$orgArr[0]/)
        {
            foreach my $ele (sort {$a <=> $b} @{$HoHoA{$outer}{$inner}})
            {
                $refArr[$ele] = $orgArr[$ele];
            }
        }
        #else
        #{
        #}
    }
    print ">$orgArr[0]\t";
    print join("\t",@refArr[1..$#refArr]);
    print "\n";
}
    @rS = ();
    print "\n";

}

Upvotes: 3

Views: 109

Answers (2)

zdim
zdim

Reputation: 66881

The shown code is well-meant but a bit too complicated; you may have lost your way in the maneuvers over the nested data structure. Here's another, simpler, approach.

Parse the information from the "reference" file (File-1) into a hash (E1 => [2, ...], ..). I put indices for data to be kept in an arrayref to allow for multiple indices for a row. Then go line by line, replacing data at these indices for rows that have a key, and print output as you go.

use warnings;
use strict;
use feature 'say';

my ($ref_file, $data_file) = @ARGV;
die "Usage: $0 ref-file data-file\n" if not $ref_file or not $data_file;

open my $fh, '<', $ref_file or die "Can't open $ref_file: $!";
my %rows;
while (<$fh>) {
    my (undef, $idx, $row_id) = split;
    for (split /,/, $row_id) {
        push @{$rows{$_}}, $idx;        # elem => [ indices ]
    }
}

my $outfile = 'new_' . $data_file;
open    $fh,     '<', $data_file  or die "Can't open $data_file: $!";
open my $fh_out, '>', $outfile    or die "Can't open $outfile: $!";

my @ref = split ' ', <$fh>;
shift @ref;                  # toss the first field

while (<$fh>) {
    my ($row_id, @data) = split;

    if (exists $rows{$row_id}) {              # this row needs attention
        my @new_row = @ref;
        foreach my $idx (@{$rows{$row_id}}) { # keep data at these indices
            $new_row[$idx] = $data[$idx];
        }
        say $fh_out join "\t", $row_id, @new_row;
    }
    else {                                    # use whole reference line
        say $fh_out join "\t", $row_id, @ref;
    }
}

The new file (shown with two spaces instead of the actual tabs, for readability)

E1  K  L  G  N  O  P  A  B  C  D
E2  K  L  M  N  O  P  A  B  C  D
E3  K  L  M  N  O  P  A  B  C  D
E4  K  L  G  N  O  P  A  B  C  D
E5  K  L  M  N  O  P  A  B  C  D
E6  K  L  M  N  O  P  A  B  C  D
E7  K  L  M  N  O  P  A  B  C  D
E8  K  L  M  N  O  P  A  B  C  D
E9  K  L  M  N  O  P  A  B  C  D
E10  K  L  M  N  O  P  A  B  C  D

Note that the given input file happens to have the same entries as the reference line to use in replacement at many indices of interest -- so we can't see those "changes" in the above output. (I tested by changing the input file so to be able to see.)

Upvotes: 2

Automaton
Automaton

Reputation: 163

This is one way to do it, if I understood your problem statement correctly:

#!/usr/bin/perl

use strict;
use warnings;

my %keep_idx;

open FILE, "file-1" or die "Couldn't open file-1";
while(<FILE>) {
    my (undef, $idx, $id_str) = split /\s+/;
    my @ids = split /,/, $id_str;
    foreach my $id (@ids) {
        $keep_idx{$id}{$idx} = 1;
    }
}
close FILE;

open FILE, "file-2" or die "Couldn't open file-2";
open OUTFILE, ">file-3" or die "Couldn't open file-3";
my (undef, @ref) = split /\s+/, <FILE>;
while(<FILE>) {
    my ($id, @src) = split /\s+/;
    my $line = "$id";
    for (my $i = 0; $i <= $#src; $i++) {
        my $e = $keep_idx{$id}{$i} ? $src[$i] : $ref[$i];
        $line .= " $e";
    }
    print OUTFILE "$line\n";
}
close OUTFILE;
close FILE;

Upvotes: 1

Related Questions