Angelika
Angelika

Reputation: 216

Reading special file using read.csv

I have a file .csv format looking like this:

"Col1, Col2, Col3, Col4"
"1, ""Hello, world!"", 7, 5"
"4, ""Name"", 4, 3"

I need to read this file using read.csv. For sep = "," I get full NA dataframe. For sep = "\t" I get only one column parsed as one string. I tried using quote = ""\", but it also doesn't work.
How can I ignore quotes and read my file correctly?
The expected result is:

Col1 Col2          Col3 Col4
1    Hello, world! 7    5
4    Name          4    3

Upvotes: 0

Views: 35

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269431

Assuming myfile.csv written reproducibly in the Note at the end, use read.csv twice. The first read.csv will strip out the outer quotes and the second read.csv then reads what is left.

read.csv(text = read.csv("myfile.csv", header = FALSE)[[1]])

giving:

  Col1           Col2 Col3 Col4
1    1  Hello, world!    7    5
2    4           Name    4    3

Note

Lines <- '"Col1, Col2, Col3, Col4"\n"1, ""Hello, world!"", 7, 5"\n"4, ""Name"", 4, 3"\n'
cat(Lines, file = "myfile.csv")

Upvotes: 1

akrun
akrun

Reputation: 886938

An option may be read with readLines, make changes with regex to replace the delimiter that are not between two letters to a different delimiter and read with read.table

df1 <- read.table(text = gsub("([a-z]),\\s*([a-z])(*SKIP)(*F)|,", ";", 
   gsub('"+', "", readLines("file1.csv")), perl = TRUE), sep=";", 
      strip.white = TRUE, header = TRUE) 

-output

df1
#  Col1          Col2 Col3 Col4
#1    1 Hello, world!    7    5
#2    4          Name    4    3


 str(df1)
#'data.frame':  2 obs. of  4 variables:
# $ Col1: int  1 4
# $ Col2: chr  "Hello, world!" "Name"
# $ Col3: int  7 4
# $ Col4: int  5 3

Upvotes: 0

Related Questions