Konstantin
Konstantin

Reputation: 3123

Replacing %uXXXX to the corresponding Unicode codepoint in Ruby

I have filenames which contain %uXXXX substrings, where XXXX are hexadecimal numbers / digits, for example %u0151, etc. I got these filenames by applying URI.unescape, which was able to replace %XX substrings to the corresponding characters but %uXXXX substrings remained untouched. I would like to replace them with the corresponding Unicode codepoints applying String#gsub. I tried the following, but no success:

"rep%u00fcl%u0151".gsub(/%u([0-9a-fA-F]{4,4})/,'\u\1')

I get this:

"rep\\u00fcl\\u0151"

Instead of this:

"repülő"

Upvotes: 4

Views: 1970

Answers (2)

joelparkerhenderson
joelparkerhenderson

Reputation: 35443

Try this code:

string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")}

In the comments, cremno has a better faster solution:

string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}

In the comments, bobince adds important restrictions, worth reading in full.

Upvotes: 2

maerics
maerics

Reputation: 156424

Per commenter @cremno's idea, try also this code:

gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }

For example:

s = "rep%u00fcl%u0151"
s.gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
# => "repülő"

Upvotes: 1

Related Questions