user944351
user944351

Reputation: 1213

Encoding Issue when fetching data from MySQL DB into R

I'm using the "RMySQL" library in R to load data from a local MySQL DB into R:

con <- dbConnect(MySQL(), user="root", password="****", dbname="twitterdata", host="localhost")
dataframe <- dbGetQuery(con, "SELECT id, plaintext, category FROM table")

When I inspect the dataframe, I see a lot of unformatted characters such as the slanted apastrophe (´) which shows up as ’.

After some research, I discovered that according to this site, some special characters (including the slanted apastrophe) are not part of the ISO-8859-1 standard but of the Windows-1252 standard.

When I run

Sys.getlocale("LC_CTYPE")

in R, it says:

"German_Austria.1252"

Doesn't it already say that I'm on the correct encoding?! In my DB (Default Charset: UTF-8), the apostrophe is encoded well.

I also tried to add a parameter to the dbConnect statement DBMSencoding="utf-8" but with no effect.

When I run

Encoding(x)

in R (where x is the character vector - a sentence), the answer is

"unknown"

Does anybody know now to solve this issue?

Thanks a lot!

Upvotes: 1

Views: 3894

Answers (1)

M&#225;rcio Mocellin
M&#225;rcio Mocellin

Reputation: 281

Do it:

con <- dbConnect(MySQL(), user="root", password="****", dbname="twitterdata", host="localhost", encoding = "latin1")

Upvotes: 1

Related Questions