Reputation: 4199
Problem -- I have a string, say Buna$002C_TexasBuna$002C_Texas
' and where $
is followed by Unicode. I want to replace these Unicode with its respective Unicode character representation.
In Perl if any Unicode is in the form of "\x{002C}
then it will be converted to it respective Unicode character. Below is the sample code.
#!/usr/bin/perl
my $string = "Hello \x{263A}!\n";
@arr= split //,$string;
print "@arr";
I am processing a file which contain 10 million of records. So I have these strings in a scalar variable. To do the same as above I am substituting $4_digit_unicode
to \x{4_digit_unicode}
as below.
$str = 'Buna$002C_TexasBuna$002C_Texas';
$str =~s/\$(.{4})/\\x\{$1\}/g;
$str = "$str"
It gives me
Buna\x{002C}_TexasBuna\x{002C}_Texas
It is because at $str = "$str"
, line $str
is being interpolated, but not its value. So \x{002C}
is not being interpolated by Perl.
Is there a way to force Perl so that it will also interpolate the contents of $str
too?
OR
Is there another method to achieve this? I do not want to take out each of the Unicodes then pack it using pack "U4",0x002C
and then substitute it back. But something in one line (like the below unsuccessful attempt) is OK.
$str =~ s/\$(.{4})/pack("U4",$1)/g;
I know the above is wrong; but can I do something like above?
For the input string $str = 'Buna$002C_TexasBuna$002C_Texas'
, the desired output is Buna,_TexasBuna,_Texas
.
Upvotes: 2
Views: 399
Reputation: 385754
"\x{263A}"
(quotes included) is a string literal, a piece of code that produces a string containing the lone character 263A
when it's evaluated by the interpreter (by being part of the script passed to perl
to be evaluated).
"\\x\{$1\}"
(quotes included), on the other hand, produces a string consisting of \
, x
, {
, the contents of $1
, and }
.
The latter is the string you are producing. You appear to be attempting to produce Perl code, but it's not valid Perl code -- it's missing the quotes -- and you never have the code interpreted by perl
.
$str =~ s/\$(.{4})/\\x\{$1\}/g;
is short for
$str =~ s/\$(.{4})/ "\\x\{$1\}" /eg;
which is completely different than
$str =~ s/\$(.{4})/ "\x{263A}" /eg;
It looks like you were going for the following:
$str =~ s/\$(.{4})/ eval qq{"\\x\{$1\}"} /eg;
But there are much simpler ways of producing the desired string, such as
$str =~ s/\$(.{4})/ pack "U4", $1 /eg;
or better yet,
$str =~ s/\$(.{4})/ chr hex $1 /eg;
Upvotes: 1
Reputation: 14038
This gives the desired result:
use strict;
use warnings;
use feature 'say';
my $str = 'Buna$002C_TexasBuna$002C_Texas';
$str =~s/\$(.{4})/chr(hex($1))/eg;
say $str;
The main interesting item is the e
in s///eg
. The e
means to treat the replacement text as code to be executed. The hex()
converts a string of hexadecimal characters to a number. The chr()
converts a number to a character. The replace line might be better written as below to avoid trying to convert a dollar followed by non-hexadecimal characters.
$str =~s/\$([0-9a-f]{4})/chr(hex($1))/egi;
Upvotes: 7
Reputation: 19423
You can execute statements such as pack
in the replacement string, you just have to use the e
regular expression modifier.
Or you can do this
$str =~s/\$(.{4})/"@{[pack("U4",$1)]}/g;
If those two options don't work please let me know, take a look at this Stackoverflow question for more information.
Upvotes: 1