Wonko the Sane
Wonko the Sane

Reputation: 832

How to replace à with a space using perl

Apologies if this is a dupe (I tried all manner of searches!). This is driving me nuts...

I need a quick fix to replace à with a space.

I've tried the following, with no success:

$str =~ s/Ã/ /g;
$str =~ s/\xC3/ /g;

What am I doing wrong here ?

Upvotes: 0

Views: 258

Answers (3)

lzc
lzc

Reputation: 949

Following this, for the simple "quick fix" Wonko was looking for:

tr/ -~//cd;

Upvotes: 0

Spenser Truex
Spenser Truex

Reputation: 159

You need to specify that you want UNICODE and not Latin-1 (or another encoding). If you're reading from a file then:

#!/usr/bin/perl
open INFILE, '<:encoding(UTF-8)', '/mypath/file';
while(<INFILE>) {
    s/\xc3/ /g;
    print;
}

I'll break that down better for you:

In <:encoding(UTF-8) you are specifying that you want to read (the <), and that you want UNICODE (the :encoding(UTF-8) part). If you weren't using unicode you would use:

open INFILE, '<', '/mypath/file';

or

open INFILE, '/mypath/file';

because by default perl will read. If you want to write you use >:encoding(UTF-8) and if you want to append (because the > overwrites the file) you use >>:encoding(UTF-8). Hope it helped!

There is another answer that specifies how to do binmode(STDIN, ":utf8") if you're trying to unicode from STDIN.

Upvotes: 0

Sam Varshavchik
Sam Varshavchik

Reputation: 118445

The statement "replace à with a space" is meaningless, because the statement does not specify which encoding is used for the character in question.

The context of this statement could be using the UTF-8 encoding, for example, as well as one of several ISO-8859 encodings. Or, maybe even UTF-16 or UTF-32.

So, for starters, you need to specify, at least, which encoding you are using. And after that, it's also necessary to specify where the input or the output is coming from.

Assuming:

1) You are using UTF-8 encoding

2) You are reading/writing STDIN and STDOUT

Then here's a short example of a filter that shows how to replace this character with a space. Assuming, of course, that the Perl script itself is also encoded in UTF-8.

use utf8;
use feature 'unicode_strings';

binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");

while (<STDIN>)
{
    s/Ã/ /g;
    print;
}

Upvotes: 6

Related Questions