Reputation: 3123
I have filenames which contain %uXXXX substrings, where XXXX are hexadecimal numbers / digits, for example %u0151, etc. I got these filenames by applying URI.unescape, which was able to replace %XX substrings to the corresponding characters but %uXXXX substrings remained untouched. I would like to replace them with the corresponding Unicode codepoints applying String#gsub. I tried the following, but no success:
"rep%u00fcl%u0151".gsub(/%u([0-9a-fA-F]{4,4})/,'\u\1')
I get this:
"rep\\u00fcl\\u0151"
Instead of this:
"repülő"
Upvotes: 4
Views: 1970
Reputation: 35443
Try this code:
string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")}
In the comments, cremno has a better faster solution:
string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}
In the comments, bobince adds important restrictions, worth reading in full.
Upvotes: 2
Reputation: 156424
Per commenter @cremno's idea, try also this code:
gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
For example:
s = "rep%u00fcl%u0151"
s.gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
# => "repülő"
Upvotes: 1