Reputation: 832
Apologies if this is a dupe (I tried all manner of searches!). This is driving me nuts...
I need a quick fix to replace à with a space.
I've tried the following, with no success:
$str =~ s/Ã/ /g;
$str =~ s/\xC3/ /g;
What am I doing wrong here ?
Upvotes: 0
Views: 258
Reputation: 949
Following this, for the simple "quick fix" Wonko was looking for:
tr/ -~//cd;
Upvotes: 0
Reputation: 159
You need to specify that you want UNICODE and not Latin-1 (or another encoding). If you're reading from a file then:
#!/usr/bin/perl
open INFILE, '<:encoding(UTF-8)', '/mypath/file';
while(<INFILE>) {
s/\xc3/ /g;
print;
}
I'll break that down better for you:
In <:encoding(UTF-8)
you are specifying that you want to read (the <), and that you want UNICODE (the :encoding(UTF-8)
part).
If you weren't using unicode you would use:
open INFILE, '<', '/mypath/file';
or
open INFILE, '/mypath/file';
because by default perl will read. If you want to write you use >:encoding(UTF-8)
and if you want to append (because the >
overwrites the file) you use >>:encoding(UTF-8)
.
Hope it helped!
There is another answer that specifies how to do binmode(STDIN, ":utf8")
if you're trying to unicode from STDIN.
Upvotes: 0
Reputation: 118445
The statement "replace à with a space" is meaningless, because the statement does not specify which encoding is used for the character in question.
The context of this statement could be using the UTF-8 encoding, for example, as well as one of several ISO-8859 encodings. Or, maybe even UTF-16 or UTF-32.
So, for starters, you need to specify, at least, which encoding you are using. And after that, it's also necessary to specify where the input or the output is coming from.
Assuming:
1) You are using UTF-8 encoding
2) You are reading/writing STDIN
and STDOUT
Then here's a short example of a filter that shows how to replace this character with a space. Assuming, of course, that the Perl script itself is also encoded in UTF-8.
use utf8;
use feature 'unicode_strings';
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
while (<STDIN>)
{
s/Ã/ /g;
print;
}
Upvotes: 6