Reputation: 677
I'm new to Perl, I'm reading text from a file and want to REPLACE some words with their translation in French. I managed to get word by word, but not by expression/string, I'm having problems getting it code wise.
Code for word by word:
my $filename = 'assign3.txt';
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
my $i=1;
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file $filename !";
while (<$fh>) {
for my $word (split)
{
print " $i. $word \n";
$i++;
for (my $j=0; $j < 9;$j++){
if ($word eq $lexicon_en[$j]){
print "Found one! - j value is $j\n";
}
}
}
}
print "\ndone here!!\n";
Here is the regular expression I'm trying to use:
/\w+\s\w+/
This is my code for strings:
while (<>) {
print ("this is text: $_ \n");
if ((split (/Due\sDate/),$_) eq "Due Date"){
print "yes!!\n";
}
}
Upvotes: 0
Views: 140
Reputation: 1437
Use \b to detect word boundary instead of \w to detect whitespace.
Combine the solution of Steven Klassen with How to replace a set of search/replace pairs?
#!/usr/bin/perl
use strict;
use warnings;
my %lexicon = (
'Winter' => 'Hiver',
'Date' => 'Date',
'Due Date' => 'Date de Remise',
'Problem' => 'Problème',
'Summer' => 'Été',
'Mark' => 'Point',
'Fall' => 'Automne',
'Assignment' => 'Devoir',
'November' => 'Novembre',
);
# add lowercase
for (keys %lexicon) {
$lexicon{lc($_)} = lc($lexicon{$_});
print $_ . " " . $lexicon{lc($_)} . "\n";
}
# Combine to one big regexp.
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon;
my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
$sample =~ s/($regexp)/$lexicon{$1}/g;
print "sample after : $sample\n";
Upvotes: 1
Reputation: 33
I think I understand the challenge you're having. Because "due date" is two words you need it to match before "due" matches otherwise you get several incorrect translations. One way to deal with that would be to order your matches by the largest number of words to the fewest so that "due date" is dealt with before "due".
If you convert your arrays to a hash (dictionary) you can order the keys based on the number of spaces and then iterate over them to do the actual substitutions:
#!/usr/bin/perl
use strict;
use warnings;
#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
# convert your arrays to a hash
my %lexicon = (
'Winter' => 'Hiver',
'Date' => 'Date',
'Due Date' => 'Date de Remise',
'Problem' => 'Problème',
'Summer' => 'Été',
'Mark' => 'Point',
'Fall' => 'Automne',
'Assignment' => 'Devoir',
'November' => 'Novembre',
);
# sort the keys on the number of spaces found
my @ordered_keys = sort { ($a =~ / /g) < ($b =~ / /g) } keys %lexicon;
my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
foreach my $key (@ordered_keys) {
$sample =~ s/${key}/${lexicon{${key}}}/ig;
}
print "sample after : $sample\n";
The output:
sample before: The due date of the assignment is a date in the fall.
sample after : The Date de Remise of the Devoir is a Date in the Automne.
The next challenge is going to be ensuring that the case of the replacement matches what's being replaced.
Upvotes: 2