Lanbo
Lanbo

Reputation: 15692

Hexadecimal Variables in substitution patterns

The file I am getting is full with badly formatted UTF-8 codes, like <0308> etc. I can identify them all right, but I want to replace them with the actual utf-8 letter, preferable with a regex. I've tried dozens of regexes like this:

s/<[0-9a-fA-F]{2,4}/\x{$1}/g
s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g

And so on, but each time it tells me that $ is not a valid hex-char (to which I fully agree). Shouldn't it just take the number in my $1 and put it in there? Or does Perl really expect me to use \x{..} or \N{U+..} only with fixed values? If so, I'd have to hand-write the conversion for every possible hex-value - not very useful.

Upvotes: 1

Views: 1582

Answers (3)

mob
mob

Reputation: 118625

For one thing, you need to use parentheses to capture something in your regular expression; otherwise $1 will not get set to anything.

chr + hex with eval will do the trick here:

s/ <
   ([0-9a-fA-F]{2,4})     # parentheses to set $1
   > 
 / 
   chr(hex($1)) 
 /gex;        

Upvotes: 3

Wes Hardaker
Wes Hardaker

Reputation: 22262

You probably need to use the eval switch to it. Try /\x{$1}/eg or /"\x{$1}"/eg

Upvotes: 1

Dave Sherohman
Dave Sherohman

Reputation: 46197

What version of perl are you using? This seems to work fine for me on 5.10.1:

$ perl -E '$foo = "<0308>"; $foo =~ s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g; say $foo'
Wide character in print at -e line 1.
�>

(With \x{$1}, it seems to substitute the numbers with nothing, but I still don't get an error message.)

Upvotes: 1

Related Questions