Daniel D.
Daniel D.

Reputation: 13

R read SQL in multiple encoding store and work in utf-8

my Rstudio Project and System Setup is on UTF8

sadly my database contains some old legacy tables in latin1

i want to pull the latin table into my UTF8 workflow without killing all my öüä :D

some scripts later on in my workflow dont like mixed encoding.

### connect db
my_db <- dbPool(RMySQL::MySQL(),
%%%% my private DB-credentials %%%% )

### read data
my_latin<-my_db %>%  dbGetQuery("SELECT * FROM legacy_sqltable;")
my_utf8<-my_db %>%  dbGetQuery("SELECT * FROM new_sqltable;")

### join data
fulldata_in_utf8 <- full_join(my_latin,my_utf8)

Upvotes: 1

Views: 2079

Answers (3)

daviewales
daviewales

Reputation: 2689

Alternative answer for MS SQL Server:

If you're using odbc or DBI::dbConnect to connect to your database, you simply need to specify the database encoding. Everything else should be automatic.

e.g.

con <-DBI::dbConnect(odbc::odbc(), DSN_name,
                UID=<username>,
                PWD=<password>,
                Database=<database>,
                encoding="latin1") # <---- Specify the database encoding here!

Upvotes: 4

Rick James
Rick James

Reputation: 142278

  1. Specify that R will work only in utf8 (or utf8mb4)
  2. MySQL, as it reads a latin1 column, will convert the characters to utf8.

You would have to work harder to get latin1 bytes from some SELECTs and utf8 bytes from other SELECTs.

I don't happen to know the R way of establishing the client as using utf8 during the connection, but here is a way to do it after connecting:

rs <- dbSendQuery(con, 'SET NAMES utf8')

If you are JOINing a tables via columns with different CHARACTER SETs (or COLLATIONs), it may work. (There are a few cases where it will spit at you, but they can be worked around.)

To do a blanket change of all the columns in a particular table:

 ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4;

Note: Outside MySQL, it is called UTF-8. Inside MySQL it is either utf8 (for the 3-byte subset) or utf8mb4 (for the full UTF-8 set).

Upvotes: 0

Eddyvonb
Eddyvonb

Reputation: 93

You can adjust the encoding of a variable in R with the function iconv or Encoding.

Here is the link to the page. The following function works for changing the encoding of your dataframe to latin1 or just making sure that the r knows that is latin-1

Encoding(my_lating) <- "latin1"

If you want to successfully join them, I suggest to encode the data frames to the same encoding.

Upvotes: 0

Related Questions