Reputation: 1391
I am on Linux Debian and I am trying to print a PDF by extracting some information from my database linux_krozki
. To do this I first created my database which has utf8mb4
character set and utf8mb4_slovenian_ci
collation.
I didn't use utf8
character set and utf8_slovenian_ci
based on this topic.
MariaDB [(none)]> SHOW CREATE DATABASE linux_krozki;
+--------------+-------------------------------------------------------------------------------------------------------+
| Database | Create Database |
+--------------+-------------------------------------------------------------------------------------------------------+
| linux_krozki | CREATE DATABASE `linux_krozki` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_slovenian_ci */ |
+--------------+-------------------------------------------------------------------------------------------------------+
So now I fill the database table skupine
with information like this - mind the letter ž
in column opombe
:
When I want to compile a PDF document using LuaLaTeX in conjunction with LuaSQL I get an error because of that character ž
:
! String contains an invalid utf-8 sequence.
l.1 Mo
en dostop za invalide, prepoved kajenja.
l.39 Opombe: & \luadirect{skupina_opombe(arg[3])}
This is weird, because my source files predracun.lua
and predracun.tex
are both UTF-8 encoded:
Here is predracun.tex
source file:
\documentclass[12pt]{article}
% package for UTF-8 encoding
\usepackage[utf8]{luainputenc}
% package for lua
\usepackage{luacode}
\directlua{dofile('predracun.lua')}
\begin{document}
\begin{tabular}{rp{11cm}}
ŽžĐđŠšĆćČč\\
\luadirect{skupina_opombe()}\\
\end{tabular}
\end{document}
And here is predracun.lua
source file:
function skupina_opombe ()
package.cpath = package.cpath .. ";/usr/lib/x86_64-linux-gnu/lua/5.3/luasql/mysql.so"
luasql = require('luasql.mysql')
env = assert (luasql.mysql())
con = assert (env:connect("linux_krozki","ziga","Slovenija123"))
cur = assert (con:execute("SELECT opombe FROM skupine WHERE id_skupine = (SELECT id_skupine FROM predracuni WHERE id_interesa =1);"))
vnos = cur:fetch ({}, "a")
tex.print(
string.format([[%s]], vnos.opombe)
)
end
I also explicitly specified \usepackage[utf8]{luainputenc}
in the predracun.tex
. So how come I still get the error? Mind that error isn't triggered by speciall characters ŽžĐđŠšĆćČč
but by a \luadirect{skupina_opombe()}
which reads from a database...
PS: I wasn't sure if I should publish this topic on TeX communitiy as it is a hybrid of TeX and programming language Lua.
Upvotes: 3
Views: 3264
Reputation: 142298
For all European character sets, utf8 and utf8mb4 are "identical". Those two CHARACTER SETS
differ for some Chinese characters and some Emoji (plus some obscure characters).
Whereas doing con:execute("SET NAMES 'utf8';")
right after connecting is valid, it would be better to specify the client's encoding during the connection. (Sorry, I don't know how to do that in LUA.)
The link you mention just explains that if you want a pile-of-poo to look like 💩
and not be censored to ????
, you must use CHARACTER SET utf8mb4
, not utf8
.
Although the Eastern European characters you mention will work equally well in utf8 or utf8mb4, I recommend using utf8mb4.
Upvotes: 3
Reputation: 1391
After studying MySQL online documentation I found out that in MySQL world it is not enough that your database has UTF-8 encoding as well as your program which calls the database!
We also need to specify the UTF-8 encoding every time we access the database!
This was a big surprise for me and I managed to solve my problem by adding one line of code into my predracun.lua
. This line saved my day:
cur = assert (con:execute("SET NAMES 'utf8';"))
It actually tells MySQL server that from this point on connection should operate entirely in UTF-8 encoding. So this line has to be located right before a line which reads data from a database:
cur = assert (con:execute("SELECT opombe FROM skupine WHERE id_skupine = (SELECT id_skupine FROM predracuni WHERE id_interesa =1);"))
The only question that still remains is:
Are database encodings
utf8
andutf8mb4
compatible or should I make my databaseutf8
insetead ofutf8mb4
? That article recommends I do not... So I will probably rather useSET NAMES 'utf8mb4
.
Upvotes: 4