Reputation: 41383
I have this legacy code snippet, which (apparently) decodes double-encoded UTF-8 text back to normal UTF-8:
# Run with python3!
import codecs
import sys
s=codecs.open('doubleutf8.dat', 'r', 'utf-8').read()
sys.stdout.write(
s
.encode('raw_unicode_escape')
.decode('utf-8')
)
I need to translate it to Lua, and imitate all possible decoding side-effects (if any).
Limitations: I may use any of available Lua modules for UTF-8 handling, but preferably the stable one, with LuaRocks support. I will not use Lupa or other Lua-Python bridging solution, neither will I call os.execute()
to invoke Python.
Upvotes: 3
Views: 1850
Reputation: 16753
You can use lua-iconv, the Lua binding to the iconv library. With it you can convert between character encodings as much as you like.
It is also available in LuaRocks.
Edit: using this answer I have been able to correctly decode the data using the following Lua code:
require 'iconv'
-- convert from utf8 to latin1
local decoder = iconv.new('latin1', 'utf8')
local data = io.open('doubleutf8.dat'):read('*a')
-- decodedData is encoded in utf8
local decodedData = decoder:iconv(data)
-- if your terminal understands utf8, prints "нижний новгород"
-- if not, you can further convert it from utf8 to any encoding, like KOI8-R
print(decodedData)
Upvotes: 3