Reputation: 19
I am relatively new with perl and I have been stuck with this for some days. Hope you can help me.
I work with two files that I will simplify as I had to process them beforehand:
file_one
with a list of names
(name_1, name_2, name_3...) and numbers
(number_1, number_2, number_3...) respectively associated
And file_two
with a list of numbers
(number_2 and number_6) and items
(item_a, item_b associated with number_2, and item_b,item_c associated with number_6)
My idea was to make hashes of both files and combine them. The point I get stuck is when I need to join the list of items into the hash (hash of an array) and then use it. So the first hash works fine, but the second has the issue.
I tried using push (@{ $hash2{$numbers} }, $items)
, but then I do not know how to combine it with the other because of the reference I used.
The final task would be to compare two names in order to get which items they share. And it would be great if it could be done just with perl and using no modules if possible.
Thank you very much
Upvotes: 1
Views: 82
Reputation: 165456
If I'm understanding you correctly, you have this:
foo => 1
bar => 2
baz => 3
Then you have:
2 => a, b
3 => b, c
And you want to know what items bar
and baz
share (for example).
One option is to put them into two tables in a SQLite database and use SQL. This can be the simplest, most flexible, and most performant way to deal with relational data like this. Especially if there's a lot of it, and especially if you want to do a lot of different searches on it. This avoids having to write a bunch of custom code and a probably increasingly complicated data structure.
Doing it in Perl, here's a sketch.
First, read in the second file, which contains the leaves (the items don't point to anything else), into a hash of arrays. You wind up with a structure like:
$nums2items{2} = [qw(a, b)];
Then read in the first file into a hash but instead of storing the numbers as values, store what %nums2items
references.
$names2items{foo} = $nums2items{1};
Now if you want to check if bar
and baz
share anything, you can get the arrays and find their intersection with Array::Utils.
use Array::Utils qw(intersect);
print join ", ", intersect( @{$names2items{bar}}, @{$names2items{baz}});
If you're going to do this a lot, and the order of the items does not matter, it is more efficient to store the items as a hash. This avoids having to sort and compare two lists. It's what intersect
does anyway, turns one list into a hash (or a set) and compares it against the other list.
use strict;
use warnings;
use v5.10;
my %nums2items = (
2 => { a => 1, b => 1, d => 1 },
3 => { b => 1, c => 1, d => 1, e => 1 },
);
my %names2nums = (
bar => $nums2items{2},
baz => $nums2items{3}
);
# Take the intersection in O(n) time.
say join ", ", grep { $names2nums{bar}{$_} } keys %{$names2nums{baz}};
Using a hash like that, where the key is the thing and the value is 1, is a very common and efficient way of representing a set.
Or you can use the Set::Tiny module. It's very straight forward. If you want to learn how to work with sets in Perl, I highly recommend reading its source.
Upvotes: 2
Reputation: 1509
From your comment to Schwern it appears you files look something like this:
foo, 1
bar, 2
biz, 3
bas, 4
and
1, jacks blue horse
2, the green horse
3, jacks
4, bing
and you are successfully reading them into two hashes with values before the comma as the key and that after as the value. Now you what to take the words pair wise and print out words they have in common. You don't want to use any modules but do it in raw Perl.
First, why isn't the second an array of arrays instead of a hash if it is numerically keyed?
Second, why are you merging them? Why not use nested loops:
my @key_list = keys %hash_1;
while ( @key_list )
{
my $curr_key = shift @key_list;
for my $next_key ( @key_list )
{
my @curr_list = @{$hash_2{$hash_1{$curr_key}}};
my @next_list = @{$hash_2{$hash_1{$next_key}}};
while ( @curr_list )
{
my $curr_word = shift @curr_list;
for my $next_word ( @next_list )
{
print "$curr_key and $next_key share $curr_word\n"
if $curr_word eq $next_word;
}
}
}
}
It's a bit brute force but it would get the job done. Instead you could use the excellent Set:: modules. Part of knowing and using a modern language like Perl or C++ well is knowing the standard and common libraries and using them.
Upvotes: 0