Dave_L
Dave_L

Reputation: 383

How to convert accented text into plain text R

I'm parsing a text file from a French hydrological database that contains lines like this:

Date    Q (m3/s)    Validité    F. exp. Libellé Fréquence exp

When R reads these lines either with read.csv or readLines, the accents are escaped with codes to form this:

Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence exp

These escape codes prevent simple grepl commands. So for example:

grepl("Date", "Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence exp")

Produces the following result:

[1] FALSE
Warning message:
In grepl("Date", "Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence   exp") :
input string 1 is invalid in this locale

What is the best way to deal with these escape codes so that I can apply simple text processing?

Upvotes: 1

Views: 124

Answers (1)

Matt Sandgren
Matt Sandgren

Reputation: 476

Give this a try:

namc <- readLines(con <- file('g:/filename.txt', "r", encoding='UTF-8')) close(con) cat(namc)

Remember to change the filename and path. You should be able to use grepl and gsub to clean it up after that

Upvotes: 1

Related Questions