Poisson
Poisson

Reputation: 1623

Perl script searching in hash table

I have programmed a Perl script which has two input files:

  1. The first file has on each line phrase and then a value between parentheses. Here an example:

    hello all (0.5)
    hi all (0.63)
    good bye all (0.09)
    
  2. The second file has a list of rules. For example:

    hello all -> salut (0.5)
    hello all -> salut à tous (0.5)
    hi all -> salut (0.63)
    good bye all -> au revoir (0.09)
    good bye -> au revoir  (0.09)
    

The script has to read the second file and for each line it extracts the phrase before the arrow (e.g. for the 1st line: hello all) and it will check if this phrase is present in the first file (in our example here it is found).

If it is present it write the whole line hello all -> salut (0.5) to the output. So in this example the output file should be:

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> > salut (0.63)
good bye all -> au revoir (0.09)

My idea is to put all the contents of the first file into a hash table. For this here my script:

#!/usr/bin/perl

use warnings;

my $vocabFile = "file1.txt";
my %hashFR =();
open my $fh_infile, '<', $InFile or die "Can't open $InFile\n";

while ( my $Ligne = <$fh_infile> ) {
  if ( $Ligne =~ /(/ ) {
    my ($cle, $valeur) = split /(/, $Ligne;
    say $cle; 
    $h{$cle}  = $valeur;
  }     
}

My question now: how do I extract the segment of word just before the arrow and search for it in the hash table?

Thank you for your help

Upvotes: 2

Views: 802

Answers (3)

ThisSuitIsBlackNot
ThisSuitIsBlackNot

Reputation: 24063

You need to use strict. This would cause your program to fail when it encountered undeclared variables like $InFile (I assume you meant to use $vocabFile). I'm going to ignore those types of issues in the code you posted because you can fix them yourself once you turn on strict.

First, a couple of logic issues with your existing code. You don't seem to actually use the numbers in parentheses that you store as your hash values, but if you ever do want to use them, you should probably get rid of the trailing ):

    my ($cle, $valeur) = split /[()]/, $Ligne;

Next, strip leading and trailing whitespace before using a string as a hash key. You may think "foo" and "foo " are the same word, but Perl doesn't.

$cle =~ s/^\s+//;
$cle =~ s/\s+$//;

Now, you're already most of the way there. You clearly already know how to read in a file, how to use split, and how to use a hash. You just need to put these all together. Read in the second file:

open my $fh2, "<", "file2" or die "Can't open file2: $!";

while (<$fh2>) {
    chomp;

...get the part before the ->

    my ($left, $right) = split /->/;

...strip leading and trailing whitespace from the key

    $left =~ s/^\s+//;
    $left =~ s/\s+$//;

...and print out the whole line if the key exists in your hash

    print $_, "\n" if exists $hash{$left};

...don't forget to close the filehandle when you're done with it

close $fh2;

(although as amon points out, this is not strictly necessary, especially since we're reading and not writing. There's a nice PerlMonks thread dealing with this topic.)

Upvotes: 2

robert.r
robert.r

Reputation: 31

#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;

open my $FILE_1, '<', shift @ARGV;
open my $FILE_2, '<', shift @ARGV;

my @file1 = <$FILE_1>;
my @file2= <$FILE_2>;

close $FILE_1;
close $FILE_2;
# Store "segments" from the first file in hash:
my %first_file_hash = map { chomp $_; my ($a) = $_ =~ /^(.*?)\s*\(/; $a => 1 } @file1;

my @result;
# Process file2 content:
foreach my $line (@file2) {
    chomp $line;
    # Retrieve "segment" from the line:
    my ($string) = $line =~ /^(.*?)\s+->/;
    # If it is present in file1, store it for future usage:
    if ($string and $first_file_hash{ $string }) {
        push @result, $line;
    }
}

open my $F, '>', 'output.txt';
print $F join("\n", @result);
close $F;

print "\nDone!\n";

Run as:

perl script.pl file1.txt file2.txt

Cheers!

Upvotes: 1

Borodin
Borodin

Reputation: 126722

This can be done very straightforwardly by creating a hash directly from the contents of the first file, and then reading each line of the second, checking the hash to see if it should be printed.

use strict;
use warnings;
use autodie;

my %permitted = do {
  open my $fh, '<', 'f1.txt';
  map { /(.+?)\s+\(/, 1 } <$fh>;
};

open my $fh, '<', 'f2.txt';
while (<$fh>) {
  my ($phrase) = /(.+?)\s+->/;
  print if $permitted{$phrase};
}

output

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> salut (0.63)
good bye all -> au revoir (0.09)

Upvotes: 1

Related Questions