Tzury Bar Yochay
Tzury Bar Yochay

Reputation: 9004

what is the way to represent a unichar in lua

If I need to have the following python value, unicode char '0':

>>> unichr(0)
u'\x00'

How can I define it in Lua?

Upvotes: 10

Views: 8568

Answers (4)

Ivan Kolmychek
Ivan Kolmychek

Reputation: 1271

For a more modern answer, Lua 5.3 now has the utf8.char:

Receives zero or more integers, converts each one to its corresponding UTF-8 byte sequence and returns a string with the concatenation of all these sequences.

Upvotes: 5

Tzury Bar Yochay
Tzury Bar Yochay

Reputation: 9004

How about

function unichr(ord)
    if ord == nil then return nil end
    if ord < 32 then return string.format('\\x%02x', ord) end
    if ord < 126 then return string.char(ord) end
    if ord < 65539 then return string.format("\\u%04x", ord) end
    if ord < 1114111 then return string.format("\\u%08x", ord) end
end

Upvotes: 6

RBerteig
RBerteig

Reputation: 43366

While native Lua does not directly support or handle Unicode, its strings are really buffers of arbitrary bytes that by convention hold ASCII characters. Since strings may contain any byte values, it is relatively straightforward to build support for Unicode on top of native strings. Should byte buffers prove to be insufficiently robust for the purpose, one can also use a userdata object to hold anything, and with the addition of a suitable metatable, endow it with methods for creation, translation to a desired encoding, concatenation, iteration, and anything else that is needed.

There is a page at the Lua User's Wiki that discusses various ways to handle Unicode in Lua programs.

Upvotes: 3

Nicol Bolas
Nicol Bolas

Reputation: 474326

There isn't one.

Lua has no concept of a Unicode value. Lua has no concept of Unicode at all. All Lua strings are 8-bit sequences of "characters", and all Lua string functions will treat them as such. Lua does not treat strings as having any Unicode encoding; they're just a sequence of bytes.

You can insert an arbitrary number into a string. For example:

"\065\066"

Is equivalent to:

"AB"

The \ notation is followed by 3 digits (or one of the escape characters), which must be less than or equal to 255. Lua is perfectly capable of handling strings with embedded \000 characters.

But you cannot directly insert Unicode codepoints into Lua strings. You can decompose the codepoint into UTF-8 and use the above mechanism to insert the codepoint into a string. For example:

"x\226\131\151"

This is the x character followed by the Unicode combining above arrow character.

But since no Lua functions actually understand UTF-8, you will have to expose some function that expects a UTF-8 string in order for it to be useful in any way.

Upvotes: 10

Related Questions