Reputation:
What is the best way to write Polish characters to MySQL using R?
I tried to send an R data.frame to my local MySQL database. The data.frame includes Polish characters like ł.
mydb = dbConnect(MySQL(), user='root', password='1234', dbname='semstorm1', host='localhost')
dbWriteTable(mydb,"dane3", dane2, append = T, row.names = F)
I get the error
could not run statement: Invalid utf8mb4 character string
EDIT
When I use insert in mysql it works fine ( code example bellow)
INSERT INTO test1 VALUES ("AAAAŁłśśś")
When I insert data via R dbsendQuery (code bellow) dbSendQuery(mydb, "insert into test1 VALUES ('asdllllłłśżżż')") this gives me asdllll³³œ¿¿¿
When I dbWriteTable(mydb,"dane3", dane2, append = T, row.names = F)
this gives me error could not run statement: Invalid utf8mb4 character string: 'praca bia'
Upvotes: 1
Views: 2220
Reputation: 142518
There are several places that you need to establish the encoding being used. It seems that you are using MySQL 8.0.
The character set for the client and for the tables do not have to be the same. And MySQL should be able to convert Cyrillic between cp852 and utf8mb4 (aka UTF-8).
The stoke-l is hex 88 in cp852 and hex CB86 in utf8mb4.
If the client has "88", but the setup says that the client is using utf8mb4, then that error message will occur.
Here are my crude notes on R, assuming you want utf8/utf8mb4; change to cp952 if the client is really using, for example, "88".
R / RStudio
Tool -> Global Options -> Code -> Saving and put UTF-8 rs <- dbSendQuery(con, 'set character set "utf8"') rs <- dbSendQuery(con, 'SET NAMES utf8')
options(encoding = "UTF-8") at the top of my main script from which I call my package seems to fix the issue with having non-ascii characters in my package code.
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8")) (also file())
See also "best practice" in Trouble with UTF-8 characters; what I see is not what I stored for a checklist of things that need to be set consistently.
In the long run, it is probably advisable to use only utf8mb4, leaving the plethora of other encodings only for initial importing of old text.
Upvotes: 1