firoz
firoz

Reputation: 91

merge each record if IDs is same using hash of array

I have two files, (1) Uniq_ID.txt having IDs, (2) Information.txt having ID and corresponding information. Each ID may have one or more corresponding information in File (2). I want to print information in single line separated by ";" if ID matches between two files.

(1) Uniq_ID.txt

a12
b13
c14
d15

(2) Information.txt

a12 AAA BBB
a12 ppp yyy
b13 CCC DDD
b13 GGG SSS
c14 HHH KKK
c14 JJJ OOO
d15 LLL LLL

Expected Output

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

program.pl

#!/usr/bin/perl                                                                                          
#./program.pl Uniq_ID.txt Information.txt                                                                        
@aa=();
%data=();
@arrayname=();
$file1=$ARGV[0];
$file2=$ARGV[1];
open(FP1, $file1);
while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1;
  $i++;
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info;
      }
  }
}
foreach (@arrayname){
print "$_:\t@{$data{$_}}\n";
}

I run the program using "./program.pl Uniq_ID.txt Information.txt" but getting the following result

a12:
b13:
c14:
d15:

Could you kindly tell me what was wrong in my program. Thanks

Upvotes: 0

Views: 123

Answers (4)

Miller
Miller

Reputation: 35208

Always include use strict; and use warnings; in EVERY script. Also include use autodie; if you do any file processing.

Your code can be simplified a lot by processing each file just once, like demonstrated below:

use strict;
use warnings;
use autodie;

my ($id_file, $info_file) = @ARGV;

my %info;
open my $fh, '<', $info_file; # \"a12 AAA BBB\na12 ppp yyy\nb13 CCC DDD\nb13 GGG SSS\nc14 HHH KKK\nc14 JJJ OOO\nd15 LLL LLL";
while (<$fh>) {
    chomp;
    my ($id) = split;
    push @{$info{$id}}, $_;
}

open $fh, '<', $id_file; # \"a12\nb13\nc14\nd15";
while (<$fh>) {
    chomp;
    print "$_:" . join(';', @{$info{$_}}) . "\n";
}

Output:

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Upvotes: 2

jaypal singh
jaypal singh

Reputation: 77115

Here is a one-liner using perl that can be ran from the command line:

perl -lne '
BEGIN {
    $x = pop; 
    push @{$h{$_->[0]}}, "@$_" for map [split], <>; 
    @ARGV = $x
}
print "$_:" . join ";" , @{ $h{$_} }' Information.txt Uniq_ID.txt

Output:

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Upvotes: 1

Borodin
Borodin

Reputation: 126732

All that is necessary is to push each line onto the appropriate element of the hash. It looks like this

use strict;
use warnings;
use autodie;

my @ids;

open my $fh, '<', 'Uniq_ID.txt';
push @ids, (split)[0] while <$fh>;

my %data;

open $fh, '<', 'Information.txt';
while (<$fh>) {
  chomp;
  my ($id) = split;
  push @{ $data{$id} }, $_;
}

for my $id (@ids) {
  printf "%s:%s\n", $id, join ';', @{ $data{$id} };
}

output

a12:a12 AAA BBB;a12 ppp yyy
b13:b13 CCC DDD;b13 GGG SSS
c14:c14 HHH KKK;c14 JJJ OOO
d15:d15 LLL LLL

Upvotes: 2

user325117
user325117

Reputation:

The problem is that you increment $i after putting $name1 into $arrayname at that index, then try to access the element again at $i, which is now one past it. Increment $i after storing $info, or use push instead.

while($name1=<FP1>)
{
  chomp($name1);
  #collect the name according to the Uniq_ID                                                           
  $arrayname[$i]=$name1; # <-- You insert into the array at $i here 
  $i++;                  # <-- You increment $i here
  open(FP2, $file2);
  while($info=<FP2>)
  {
    chomp($info);
    @aa=split(/\s/,$info);
    $name2=$aa[0];
    $seq=$aa[1];
    #if name in Uniq_ID is same with name in information.txt                                                    
    if($name1 =~ /^$name2$/)
    {                                                          
        #hash of arrays"                                                                          
        #put each line of information.txt into a Uniq_ID                                                     
        push @{$data{$arrayname[$i]}}, $info; # <-- You access the element at $i here
      }
  }
  # <-- You should increment $i here (but use push instead)
}

Upvotes: 0

Related Questions