Poisson
Poisson

Reputation: 1623

Modify input file in Perl

I have written a Perl program which take as input 2 text files.

The first file contains sequences and probabilities with this format

good morning 0.5

The second file contains all the words with their probabilities with this format

good 0.5
morning 0.6

My script calculates a formula for each sequence

log( prob(sequence) / (prob(word1) - prob(sequence)) * (prob(word2) - prob(sequence)) )

The probleme is that I have some cases where prob(sequence) is the same as prob(word1) or prob(word2) so I get Illegal division by zero

Is there any way to change the values in the second file by adding a decimal in these cases? (smoothing)

#!/usr/bin/perl
use strict; ## PLE
use warnings;

my $inFile = "file1.txt";
my $outFile ="TEST.txt";
my %hashFR = getVocab("file2.txt");
my @result;

my $bloc = 50000;
my $cmp = 0;

open fileIn, "<$inFile" or die $!;
while (<fileIn>) {
    chomp;
    my $flag = 0;
    my $ligne = $_;
    my @words = getWords($ligne);
    if (my $prob = pop @words) {
        $prob  =~ s/\(//g;
        my $probWords = 1;

        foreach my $word (@words) {
            my $probWord;
            if (exists $hashFR{$word}) {
                $probWord = $hashFR{$word};
            }
            $probWords *= $probWord-$prob;
        }

        my $calc = $prob*log2($prob/($probWords));
        my $result10 = sprintf("%.10f", $calc);
        push @result, join(' ',@words) ." (".$result10.")\n";
    }
}

#if(scalar(@result) == $bloc)
{
    $cmp += $bloc;
    print "$cmp lignes traités\n";
    writeToResultFile($outFile,@result);
    @result = ();
}

sub getWords {
    my ($ligne) = $_;

    my @words = split(' ', $ligne);

    return @words;
}

sub getVocab {
    my ( $filename ) = @_;
    my %hash = ();

    open fileVocab, "<$filename" or die $!;
    while (<fileVocab>) {
        chomp;

        if (2 == (my($mot, $prob) = split( / / ))) {
            $hash{trim($mot)} = trim($prob);
        }
    }
    close fileVocab;
    return %hash;
}

sub writeToResultFile {
    my ($filename,@res) = @_;
    open(INFO, ">>$filename");
    foreach ( @res) {
        print INFO $_;
    }
    close INFO
}
sub log2 {
    my $n = shift;
    return (log($n)/log(10))/(log(2)/log(10));
}

sub trim($) {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

Upvotes: 0

Views: 138

Answers (1)

user1126070
user1126070

Reputation: 5069

You could use exception handling like this:

my $calc
eval {
 $calc = $prob*log2($prob/($probWords));
};
if ($@){
  $calc = 0;#or whatever suits you
}

Or more simply:

my $calc = eval { $prob*log2($prob/($probWords)) } // 'NaN';

Upvotes: 2

Related Questions