Reputation: 1213
I'm using the "RMySQL" library in R to load data from a local MySQL DB into R:
con <- dbConnect(MySQL(), user="root", password="****", dbname="twitterdata", host="localhost")
dataframe <- dbGetQuery(con, "SELECT id, plaintext, category FROM table")
When I inspect the dataframe, I see a lot of unformatted characters such as the slanted apastrophe (´) which shows up as ’.
After some research, I discovered that according to this site, some special characters (including the slanted apastrophe) are not part of the ISO-8859-1 standard but of the Windows-1252 standard.
When I run
Sys.getlocale("LC_CTYPE")
in R, it says:
"German_Austria.1252"
Doesn't it already say that I'm on the correct encoding?! In my DB (Default Charset: UTF-8), the apostrophe is encoded well.
I also tried to add a parameter to the dbConnect statement DBMSencoding="utf-8" but with no effect.
When I run
Encoding(x)
in R (where x is the character vector - a sentence), the answer is
"unknown"
Does anybody know now to solve this issue?
Thanks a lot!
Upvotes: 1
Views: 3894
Reputation: 281
Do it:
con <- dbConnect(MySQL(), user="root", password="****", dbname="twitterdata", host="localhost", encoding = "latin1")
Upvotes: 1