Antoine
Antoine

Reputation: 1719

RStudio character encoding issue: quotation marks replaced by \x92

I am reading.csv a file containing some naturally occurring text. Sometimes in the text, ' is used to serve as an apostrophe, sometimes ’ is used instead (see lines 2 and 6 of this table).

When reading the file in RStudio on my laptop, I have no issue (both ' and ’ are there in the text). However, when reading the file in Rstudio server (EC2 instance), all the ’ are replaced by \x92 which is an issue.

Following the first bullet point of the first answer of this question, I have tried via the global options menu in RStudio server to change the encoding: Unicode, UTF-8, UTF-16, Windows-1252, ISO8859-1, etc.

Unfortunately, regardless of my selection, the same issue arises every time.

Thanks a lot in advance for any help.

Upvotes: 1

Views: 1893

Answers (1)

Antoine
Antoine

Reputation: 1719

I just found a solution so I am answering my own question:

Somehow my attempts to set the encoding via the global options menu in RStudio server did not have any impact on read.csv (I thought it was supposed to use the encoding specified in the global options by default getOption("encoding"), but it does not seem to always be the case...)

Anyways, by specifying the type of encoding directly in read.csv using the fileEncoding argument, and by inspecting the data, I could see that this time my different encoding selections had an impact. After a couple of trials, I found that "Windows-1252" gave me what I wanted.

Upvotes: 1

Related Questions