user8428755
user8428755

Reputation:

R to MySQL throws the error "could not run statement: Invalid utf8mb4 character string"

What is the best way to write Polish characters to MySQL using R?

I tried to send an R data.frame to my local MySQL database. The data.frame includes Polish characters like ł.

mydb = dbConnect(MySQL(), user='root', password='1234', dbname='semstorm1', host='localhost')    
dbWriteTable(mydb,"dane3", dane2, append = T, row.names = F)

I get the error

could not run statement: Invalid utf8mb4 character string

This could be helpful: enter image description here

EDIT

When I use insert in mysql it works fine ( code example bellow)

INSERT INTO test1 VALUES ("AAAAŁłśśś")

When I insert data via R dbsendQuery (code bellow) dbSendQuery(mydb, "insert into test1 VALUES ('asdllllłłśżżż')") this gives me asdllll³³œ¿¿¿

When I dbWriteTable(mydb,"dane3", dane2, append = T, row.names = F)

this gives me error could not run statement: Invalid utf8mb4 character string: 'praca bia'

Upvotes: 1

Views: 2220

Answers (1)

Rick James
Rick James

Reputation: 142518

There are several places that you need to establish the encoding being used. It seems that you are using MySQL 8.0.

The character set for the client and for the tables do not have to be the same. And MySQL should be able to convert Cyrillic between cp852 and utf8mb4 (aka UTF-8).

The stoke-l is hex 88 in cp852 and hex CB86 in utf8mb4.

If the client has "88", but the setup says that the client is using utf8mb4, then that error message will occur.

Here are my crude notes on R, assuming you want utf8/utf8mb4; change to cp952 if the client is really using, for example, "88".

R / RStudio

Tool -> Global Options -> Code -> Saving and put UTF-8 rs <- dbSendQuery(con, 'set character set "utf8"') rs <- dbSendQuery(con, 'SET NAMES utf8')

options(encoding = "UTF-8") at the top of my main script from which I call my package seems to fix the issue with having non-ascii characters in my package code.

read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8")) (also file())

Character Encoding

See also "best practice" in Trouble with UTF-8 characters; what I see is not what I stored for a checklist of things that need to be set consistently.

In the long run, it is probably advisable to use only utf8mb4, leaving the plethora of other encodings only for initial importing of old text.

Upvotes: 1

Related Questions