Reputation: 3
I have two "different" files with the same kind of data i.e.
KEY_gl Start_gl End_gl
1 114029 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
KEY_gm Start_gm End_gm
1 115000 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
I have saved these two different files in "hash" . The "Key" column is the key and the start and end are the values for these two different keys.
I have written a code to compare these two hashes and print out the "similar" and "non similar" keys from the files.
The Code is
my %hash_gl = ();
my %hash_gm = ();
open( my $fgl, "/home/gaurav/GMR/new_gl.txt" ) or die "Can't open the file";
while ( my $line_gl = <$fgl> ) {
chomp $line_gl;
my ( $key_gl, $start_gl, $end_gl ) = split( "\t", $line_gl );
$hash_gl{$key_gl} = [ $start_gl, $end_gl ];
}
while ( my ( $key_gl, $val_gl ) = each %hash_gl ) {
#print "$key_gl => @{$val_gl}\n";
}
open( my $fgm, "/home/gaurav/GMR/new_gm.txt" ) or die "Can't open the file";
while ( my $line_gm = <$fgm> ) {
chomp $line_gm;
my ( $key_gm, $start_gm, $end_gm ) = split( "\t", $line_gm );
$hash_gm{$key_gm} = [ $start_gm, $end_gm ];
}
while ( my ( $key_gm, $val_gm ) = each %hash_gm ) {
#print "$key_gm => @{$val_gm}\n";
}
for ( sort keys %hash_gl ) {
unless ( exists $hash_gm{$_} ) {
print "$_: not found in second hash\n";
next;
}
if ( $hash_gm{$_} == $hash_gl{$_} ) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
}
Kindly tell the errors in this as I am not getting the desired output.Also , I am sorry that I am new to this forum so I am not able to do the editing correctly.
Upvotes: 0
Views: 225
Reputation: 100
#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
);
sub check_hash_size {
my $hash_gl = shift;
my $hash_gm = shift;
if ((keys %$hash_gl) != (keys %$hash_gm)) {
say "the hashes are 2 different sizes";
}
else
{
say "the hashes are the same size";
}
}
sub diag_hashes {
my $hash_gl = shift;
my $hash_gm = shift;
for my $gl_key ( keys %$hash_gl ) {
if ( (scalar @{$$hash_gl{$gl_key}}) != (scalar @{$$hash_gm{$gl_key}}) ) {
say "$gl_key entry arrays are different sizes";
}
else
{
say "arrays are the same size for key $gl_key";
}
if ( ((scalar @{$$hash_gl{$gl_key}}) or (scalar @{$$hash_gm{$gl_key}})) > 2 ) {
say "$gl_key entry array exceeds 2 values";
}
if ($$hash_gl{$gl_key}[0] eq $$hash_gm{$gl_key}[0]) {
say "$gl_key start is the same in both hashes";
}
else
{
say "** key $gl_key start is different";
}
if ($$hash_gl{$gl_key}[1] eq $$hash_gm{$gl_key}[1]) {
print "$gl_key end is the same in both hashes","\n";
}
else
{
say "** key $gl_key end is different";
}
}
}
check_hash_size( \%hash_gl ,\%hash_gm);
diag_hashes( \%hash_gl ,\%hash_gm);
Upvotes: 0
Reputation: 6578
Assuming that you want to compare the values, say the start position, here's how I'd do it:
use warnings;
use strict;
open my $in, '<', '1.txt' or die "$!\n";
open my $in2, '<', '2.txt' or die "$!\n";
my (%hash1, %hash2);
while (<$in>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash1{$key} = [$start, $stop];
}
while (<$in2>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash2{$key} = [$start, $stop];
}
for my $key (sort keys %hash1){
if (@{$hash1{$key}}[0] == @{$hash2{$key}}[0]){
print "start matches: file1 @{$hash1{$key}}[0]\tfile2 @{$hash2{$key}}[0]\n";
}
else {print "start doesn't match: file1 @{$hash1{$key}}[0]\t file2 @{$hash2{$key}}[0]\n"};
}
Upvotes: 0
Reputation: 54381
After reading your files, the two hashes look like this. I created the output using Data::Dump's function dd
.
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
KEY_gl => [ "Start_gl", "End_gl" ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
KEY_gm => [ "Start_gm", "End_gm" ],
);
As you can see, the values are array refs. You put them in array refs when saying $hash_gl{$key_gl} == [ $start_gl, $end_gl ];
(and the same for gm
).
When you compare the two, you are using ==
, which is used for numerical comparison. If you print one of the $hash_gm{$_}
values, you will get something like this:
ARRAY(0x3bb114)
That's because it's an array ref. You cannot compare those using ==
.
You now have two possibilities:
you can do the comparison yourself; to do that, you need to go into the array ref and compare each value:
if ( $hash_gm{$_}->[0] == $hash_gl{$_}->[0]
&& $hash_gm{$_}->[1] == $hash_gl{$_}->[1] )
{
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
you can install and use Array::Utils
use Array::Utils 'array_diff';
# later...
if (! array_diff( @{ $hash_gm{$_} }, @{ $hash_gl{$_} } )) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
I would go with the first solution as that is more readable because you do not need the dereferencing and the effort to install a module just to save half a line of code is not worth it.
Upvotes: 1