Reputation: 383
I'm parsing a text file from a French hydrological database that contains lines like this:
Date Q (m3/s) Validité F. exp. Libellé Fréquence exp
When R reads these lines either with read.csv
or readLines
, the accents are escaped with codes to form this:
Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence exp
These escape codes prevent simple grepl commands. So for example:
grepl("Date", "Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence exp")
Produces the following result:
[1] FALSE
Warning message:
In grepl("Date", "Date Q (m3/s) Validit\xe9 F. exp. Libell\xe9 Fr\xe9quence exp") :
input string 1 is invalid in this locale
What is the best way to deal with these escape codes so that I can apply simple text processing?
Upvotes: 1
Views: 124
Reputation: 476
Give this a try:
namc <- readLines(con <- file('g:/filename.txt', "r", encoding='UTF-8'))
close(con)
cat(namc)
Remember to change the filename and path. You should be able to use grepl
and gsub
to clean it up after that
Upvotes: 1