Reputation: 216
I have a file .csv
format looking like this:
"Col1, Col2, Col3, Col4"
"1, ""Hello, world!"", 7, 5"
"4, ""Name"", 4, 3"
I need to read this file using read.csv
. For sep = ","
I get full NA
dataframe. For sep = "\t"
I get only one column parsed as one string. I tried using quote = ""\"
, but it also doesn't work.
How can I ignore quotes and read my file correctly?
The expected result is:
Col1 Col2 Col3 Col4
1 Hello, world! 7 5
4 Name 4 3
Upvotes: 0
Views: 35
Reputation: 269431
Assuming myfile.csv
written reproducibly in the Note at the end, use read.csv
twice. The first read.csv
will strip out the outer quotes and the second read.csv
then reads what is left.
read.csv(text = read.csv("myfile.csv", header = FALSE)[[1]])
giving:
Col1 Col2 Col3 Col4
1 1 Hello, world! 7 5
2 4 Name 4 3
Lines <- '"Col1, Col2, Col3, Col4"\n"1, ""Hello, world!"", 7, 5"\n"4, ""Name"", 4, 3"\n'
cat(Lines, file = "myfile.csv")
Upvotes: 1
Reputation: 886938
An option may be read with readLines
, make changes with regex to replace the delimiter that are not between two letters to a different delimiter and read with read.table
df1 <- read.table(text = gsub("([a-z]),\\s*([a-z])(*SKIP)(*F)|,", ";",
gsub('"+', "", readLines("file1.csv")), perl = TRUE), sep=";",
strip.white = TRUE, header = TRUE)
-output
df1
# Col1 Col2 Col3 Col4
#1 1 Hello, world! 7 5
#2 4 Name 4 3
str(df1)
#'data.frame': 2 obs. of 4 variables:
# $ Col1: int 1 4
# $ Col2: chr "Hello, world!" "Name"
# $ Col3: int 7 4
# $ Col4: int 5 3
Upvotes: 0