Dori
Dori

Reputation: 18423

Lua String char encoding

I cant see what encoding Lua uses for its strings.

Im using

string.byte (s [, i [, j]])

which has the doc

Returns the internal numerical codes of the characters s[i], s[i+1], ···, s[j]. The default value for i is 1; the default value for j is i. Note that numerical codes are not necessarily portable across platforms.

Reading around people suggest it uses ASCII - which is fine for me - but I dont get the changing across platforms - I thought the very nature of using a single encoding (like ASCII) is that this wouldnt happen - or is it just saying this as ASCII does not define for over 126 (or 127) and therefore different countries / OEMS / OSs etc may be using custom ASCII extensions from decades ago for the upper range?

Its important for me to know that [a-zA-Z] will have the same char values on all platforms im running on.

The Lua doc could be a bit more specific here!

Any light anyone can shed on this would be great thx

Upvotes: 3

Views: 8994

Answers (1)

Joey
Joey

Reputation: 354864

I'm fairly sure you can safely assume an ASCII-derived encoding. So the minuscule set of characters you're interested in stays the same.

The note about the code changing between platforms likely means that Lua doesn't know anything about the character encoding at all and thus just uses whatever bytes the OS hands out. On Linux this is likely UTF-8, which means you'd have to deal with individual code units when stepping outside ASCII. On Windows I could imagine it being the system's legacy codepage, which means sort-of Latin 1 (CP 1252) in much of the Western world.

Upvotes: 5

Related Questions