Sumedh
Sumedh

Reputation: 4965

Displaying UTF-8 encoded characters in R

I am using the RODBC package to read data from SQL server. R is reading the Chinese characters as "?????" I have passed the parameter DBMSencoding = "UTF-8" to the odbcConnect function.

Following is the sample code I am using:

Connection <- odbcConnect("abc", uid = "123", pwd = "123", 
                          DBMSencoding = "UTF-8", readOnlyOptimize=T)

Var1 <- sqlQuery(Connection, query, errors = TRUE, stringsAsFactors=F)

May be I didn't pass the arguments the way I am supposed to?

sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-12

loaded via a namespace (and not attached):
[1] tools_3.2.3

odbcGetInfo(mainConnection)
         DBMS_Name               DBMS_Ver        Driver_ODBC_Ver      Data_Source_Name            Driver_Name 
        "Microsoft SQL Server"  "10.50.4000"     "03.52"                                         "SQLSRV32.DLL" 
        Driver_Ver               ODBC_Ver            Server_Name 
      "06.01.7601"           "03.80.0000"            

Upvotes: 2

Views: 6940

Answers (3)

pzhao
pzhao

Reputation: 335

I got the same problem and successfully solved it. It was quite simple. Go to Control Panel --> Region and Language --> Administrative --> Change system locale --> Chinese.

Upvotes: 0

xhj
xhj

Reputation: 21

Check the database's character encoding:

select userenv('language') from dual;
SIMPLIFIED CHINESE_CHINA.AL32UTF8 

Change your Environment Variable NLS_LANG before connecting to the database:

Sys.setenv(NLS_LANG="SIMPLIFIED CHINESE_CHINA.AL32UTF8")
Connection <- odbcConnect("abc", uid = "123", pwd = "123", DBMSencoding = "UTF-8", readOnlyOptimize=T)

Upvotes: 2

drammock
drammock

Reputation: 2543

R on Windows has a lot of problems displaying characters outside of ASCII, even though it is often faithfully representing them internally. There is a lot of information in this answer about why this is the case, and some simple diagnostics in this answer. First try plotting, like:

# first, make sure plotting Chinese works in general
# (i.e., you have an appropriate font)
hanzi <- "漢字"
plot(1, 1, type="n")
text(1, 1, hanzi)

If that works, replace the hanzi <- "漢字" line with your sql query line to get some Chinese text from your database into a string variable, and try plotting that. If it shows up on the plot, then the characters are being read fine and represented internally fine, and the problem is just displaying them in the console. If plotting worked for the "漢字" string variable but doesn't work for your SQL-extracted string, then at least you know that the problem is actually with the SQL part and not just with display in the console.

Upvotes: 1

Related Questions