A. De Bruyn
A. De Bruyn

Reputation: 181

String with accents in MySQL table returned as NA in R (RODBC)

I have a table called 'segments' stored in a MySQL database. The table is UTF-8 encoded (charset utf8, collation utf8_general_ci), and the column 'label' contains strings with accents, like "Fidèles", "Arrêtés", etc.

I query that table using R, and the RODBC package, with a simple:

data = sqlQuery(channel = myodbcconnection, query = "SELECT label FROM segments")

Doing so will result in all strings containing accents being replaced with NA values. It's not only the accents being replaced by filling characters. The entire string becomes NA.

What's even weirded is that if I run an UPDATE on the database from R, with something like:

sqlQuery(channel = myodbcconnection, query = "UPDATE segments SET label = 'Testé et approuvé' WHERE id = 70")

The database is updated correctly. But if I select it back, it returns an NA value.

This is driving me crazy. ;-) Any help would be greatly appreciated.

Upvotes: 1

Views: 515

Answers (1)

A. De Bruyn
A. De Bruyn

Reputation: 181

After much pain, I found a partial answer to my own question.

The problem was, as expected, with the ODBC connection, and the way it coded utf8 communications. In the ODBC connection string, you have to specify CharSet=utf8. For instance:

Driver={MySQL ODBC 3.51 Driver};Server=localhost;Database=myDataBase; User=myUsername;Password=myPassword;Option=3;CharSet=utf8;

Note that:

  • It is not enough to select the ODBC Unicode (utf8-compatible) driver.
  • Specifying "DBMSencoding" as utf8 in the odbcConnect function was not sufficient either.
  • This fixed the issue on my localhost, but not in my production environment (Ubuntu/AWS), for unknown reason.

Upvotes: 3

Related Questions