Alexander Gladysh
Alexander Gladysh

Reputation: 41383

How to translate double-UTF-8-decoder code in Python to Lua

I have this legacy code snippet, which (apparently) decodes double-encoded UTF-8 text back to normal UTF-8:

# Run with python3!
import codecs
import sys
s=codecs.open('doubleutf8.dat', 'r', 'utf-8').read()
sys.stdout.write(
                s
                .encode('raw_unicode_escape')
                .decode('utf-8')
        )

I need to translate it to Lua, and imitate all possible decoding side-effects (if any).

Limitations: I may use any of available Lua modules for UTF-8 handling, but preferably the stable one, with LuaRocks support. I will not use Lupa or other Lua-Python bridging solution, neither will I call os.execute() to invoke Python.

Upvotes: 3

Views: 1850

Answers (1)

Michal Kottman
Michal Kottman

Reputation: 16753

You can use lua-iconv, the Lua binding to the iconv library. With it you can convert between character encodings as much as you like.

It is also available in LuaRocks.

Edit: using this answer I have been able to correctly decode the data using the following Lua code:

require 'iconv'
-- convert from utf8 to latin1
local decoder = iconv.new('latin1', 'utf8')
local data = io.open('doubleutf8.dat'):read('*a')
-- decodedData is encoded in utf8
local decodedData = decoder:iconv(data)
-- if your terminal understands utf8, prints "нижний новгород"
-- if not, you can further convert it from utf8 to any encoding, like KOI8-R
print(decodedData)

Upvotes: 3

Related Questions