Reputation: 1721
I am reading a table from a SQL database, with a column of cities in (English, Arabic and possibly other languages). The encoding type for all is Unknown, when I try to force them using Encoding, some change and some remain Unknown.
I tried using dbGetQuery but it is the same :
con <- dbConnect(RMySQL::MySQL(), host = "***",dbname="***",user = "***", password = "***")
dbGetQuery(con,"set names utf8")
Q1 <- dbSendQuery(con, "SELECT * FROM cities")
city <- fetch(Q1, n = -1)
> Encoding(city$name) %>% table()
.
unknown
45734
When I force the change, some get converted, but entries with Arabic characters for example do not get converted.
> Encoding(city$name) <- "UTF-8"
> Encoding(city$name) %>% table()
.
unknown UTF-8
44920 814
And here is the result of SHOW VARIABLES LIKE 'character_set_%
dbSendQuery(con, "SET NAMES UTF8; ")
<MySQLResult:8,3,3>
> dbGetQuery(con, "SHOW VARIABLES LIKE 'character_set_%'")
Variable_name Value
1 character_set_client utf8
2 character_set_connection utf8
3 character_set_database latin1
4 character_set_filesystem binary
5 character_set_results utf8
6 character_set_server latin1
7 character_set_system utf8
8 character_sets_dir /usr/share/mysql/charsets/
And here's the session info
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_1.1.2 tidyr_0.5.1 purrr_0.2.2 dplyr_0.5.0 RMySQL_0.10.9 DBI_0.4-1
loaded via a namespace (and not attached):
[1] magrittr_1.5 R6_2.1.2 assertthat_0.1 tools_3.3.1 tibble_1.1 Rcpp_0.12.6
And here's the encoding in the db: table info from php my admin
Upvotes: 2
Views: 759
Reputation: 5704
As said in the docs:
ASCII strings will never be marked with a declared encoding, since their representation is the same in all supported encodings.
Illustration:
v <- c("café", "floor", "window", "naïve")
Encoding(v) <- "UTF-8"
Encoding(v)
# [1] "UTF-8" "unknown" "unknown" "UTF-8"
Some of your city names are in English, so they probably don't contain non-ASCII characters.
Upvotes: 1