Reputation: 43
I Currently require a regular expression to search and replace all |–| with |-|. I am Currently Replacing |`| with |'| and it is working using:
while($_ =~ s/`/'/g)
{
print "Line: '$.'. ";
print "Found '$&'. ";
}
However using the same regex is not working for all of my below attempts:
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\&ndash/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/&ndash/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
The Script Currently looks as follows:
#!/usr/bin/perl
use strict;
use warnings;
my $FILE;
my $filename = 'NoDodge.c';
open($FILE,"<service.c") or die "File not opened";
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
while (<$FILE>)
{
while($_ =~ s/`/'/g)
{
print "Line: '$.'. ";
print "Found '$&'. ";
}
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
print $fh $_;
}
close $fh;
print "\nCompleted\n";
Example of Current Result:
Line: '152'. Found '`'.
Line: '162'. Found '`'.
Completed
SOLUTION: Provided by Borodin,
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use open qw/ :std :encoding(utf8) /;
my $FILE;
my $fh;
my $readfile = 'service.c';
my $writefile = 'NoDodge.c';
open($FILE,'<',$readfile) or die qq{Unable to open "$readfile" for input: $!};
open($fh, '>',$writefile) or die qq{Unable to open "$writefile" for output: $!};
while (<$FILE>)
{
while(s/–/-/g)
{
print "Found: $& on Line: $.\n";
}
while(s/`/'/g)
{
print "Found: $& on Line: $.\n";
}
print $fh $_;
}
close $fh;
close $FILE;
print "\nService Migrated to $writefile\n";
Example Output:
Found: – on Line: 713
Found: ` on Line: 713
Found: – on Line: 724
Found: ` on Line: 724
Found: ` on Line: 794
Service Migrated to NoDodge.c
Upvotes: 2
Views: 312
Reputation: 126742
You need to use utf8
at the top of your program, otherwise Perl will see the individual bytes that make up the UTF-8 encoding of the en-dash (E2
80
93
). There's also no need to specify $_
as the object of the substitution as it is the default, and you needn't escape an en-dash as it's not a special character within regex patterns
use utf8;
...
while( s/–/-/g ) { ... }
Or you may want to make it clearer using Unicode names, as it's far from obvious at a glance what it is you're replacing. In that case you don't need use utf8
as long as you name every non-ASCII character instead of using it literally, like this
while( s/\N{EN DASH}/-/g ) { ... }
You will also need to open the files -- both input and output -- as UTF-8-encoded. The simplest way is to set UTF-8 as the default mode. You would add this line near the top of your program
use open qw/ :std :encoding(utf8) /;
or you can open each file explicitly as UTF-8-encoded like this
my $filename = 'NoDodge.c';
open my $in_fh, '<:encoding(utf8)', 'service.c'
or die qq{Unable to open "service.c" for input: $!};
open my $out_fh, '>:encoding(utf8)', $filename
or die qq{Unable to open "$filename" for output: $!};
Upvotes: 4