71GA
71GA

Reputation: 1391

LuaLaTeX - string contains an invalid utf-8 sequence

I am on Linux Debian and I am trying to print a PDF by extracting some information from my database linux_krozki. To do this I first created my database which has utf8mb4 character set and utf8mb4_slovenian_ci collation.

I didn't use utf8 character set and utf8_slovenian_ci based on this topic.

MariaDB [(none)]> SHOW CREATE DATABASE linux_krozki;
+--------------+-------------------------------------------------------------------------------------------------------+
| Database     | Create Database                                                                                       |
+--------------+-------------------------------------------------------------------------------------------------------+
| linux_krozki | CREATE DATABASE `linux_krozki` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_slovenian_ci */ |
+--------------+-------------------------------------------------------------------------------------------------------+

So now I fill the database table skupine with information like this - mind the letter ž in column opombe:

enter image description here

When I want to compile a PDF document using LuaLaTeX in conjunction with LuaSQL I get an error because of that character ž:

! String contains an invalid utf-8 sequence.
l.1 Mo
    en dostop za invalide, prepoved kajenja.
l.39        Opombe: & \luadirect{skupina_opombe(arg[3])}

This is weird, because my source files predracun.lua and predracun.tex are both UTF-8 encoded:

Here is predracun.tex source file:

\documentclass[12pt]{article}

% package for UTF-8 encoding 
\usepackage[utf8]{luainputenc}

% package for lua   
\usepackage{luacode}
    \directlua{dofile('predracun.lua')}

\begin{document}
    \begin{tabular}{rp{11cm}}
        ŽžĐđŠšĆćČč\\
        \luadirect{skupina_opombe()}\\
    \end{tabular}
\end{document}   

And here is predracun.lua source file:

function skupina_opombe ()
    package.cpath = package.cpath .. ";/usr/lib/x86_64-linux-gnu/lua/5.3/luasql/mysql.so"
    luasql = require('luasql.mysql')
    env = assert (luasql.mysql())
    con = assert (env:connect("linux_krozki","ziga","Slovenija123"))

    cur = assert (con:execute("SELECT opombe FROM skupine WHERE id_skupine = (SELECT id_skupine FROM predracuni WHERE id_interesa =1);"))

    vnos = cur:fetch ({}, "a")

    tex.print(
        string.format([[%s]], vnos.opombe)
    )

end 

I also explicitly specified \usepackage[utf8]{luainputenc} in the predracun.tex. So how come I still get the error? Mind that error isn't triggered by speciall characters ŽžĐđŠšĆćČč but by a \luadirect{skupina_opombe()} which reads from a database...

PS: I wasn't sure if I should publish this topic on TeX communitiy as it is a hybrid of TeX and programming language Lua.

Upvotes: 3

Views: 3264

Answers (2)

Rick James
Rick James

Reputation: 142298

For all European character sets, utf8 and utf8mb4 are "identical". Those two CHARACTER SETS differ for some Chinese characters and some Emoji (plus some obscure characters).

Whereas doing con:execute("SET NAMES 'utf8';") right after connecting is valid, it would be better to specify the client's encoding during the connection. (Sorry, I don't know how to do that in LUA.)

The link you mention just explains that if you want a pile-of-poo to look like 💩 and not be censored to ????, you must use CHARACTER SET utf8mb4, not utf8.

Although the Eastern European characters you mention will work equally well in utf8 or utf8mb4, I recommend using utf8mb4.

Upvotes: 3

71GA
71GA

Reputation: 1391

After studying MySQL online documentation I found out that in MySQL world it is not enough that your database has UTF-8 encoding as well as your program which calls the database!

We also need to specify the UTF-8 encoding every time we access the database!

This was a big surprise for me and I managed to solve my problem by adding one line of code into my predracun.lua. This line saved my day:

cur = assert (con:execute("SET NAMES 'utf8';"))

It actually tells MySQL server that from this point on connection should operate entirely in UTF-8 encoding. So this line has to be located right before a line which reads data from a database:

cur = assert (con:execute("SELECT opombe FROM skupine WHERE id_skupine = (SELECT id_skupine FROM predracuni WHERE id_interesa =1);"))

The only question that still remains is:

Are database encodings utf8 and utf8mb4 compatible or should I make my database utf8 insetead of utf8mb4? That article recommends I do not... So I will probably rather use SET NAMES 'utf8mb4.

Upvotes: 4

Related Questions