Reputation: 3542
Suppose I have a csv file looks like this:
Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""
desired output should be:
df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!',
RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA)
df
Type ID NAME CONTENT RESPONSE GRADE SOURCE
1 A 3 NA I have comma, ha! I have open double quotes" A NA
I tried to use read.csv
, since the data provider uses quote to escape comma in the string, but they forgot to escape double quotes in string with no comma, so no matter whether I disable quote in read.csv
I won't get desired output.
How can I do this in R? Other package solutions are also welcome.
Upvotes: 12
Views: 29996
Reputation: 26446
This is not valid CSV, so you'll have to do your own parsing. But, assuming the convention is as follows, you can just toggle with scan
to take advantage of most of its abilities:
next_field<-function(stream) {
p<-seek(stream)
d<-readChar(stream,1)
seek(stream,p)
if(d=="\"")
field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)
else
field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
return(field)
}
Assuming the above convention, this sufficient to parse as follows
s<-file("example.csv",open="rt")
header<-readLines(s,1)
header<-scan(what="",text=header,sep=",")
line<-replicate(length(header),next_field(s))
setNames(as.data.frame(lapply(line,type.convert)),header)
Type ID NAME CONTENT RESPONSE GRADE SOURCE 1 A 3 NA I have comma, ha! I have open double quotes" A NA
However, in practice you might want to first write back the fields, quoting each, to another file, so you can just read.csv
on the corrected format.
Upvotes: 2
Reputation: 49448
fread
from data.table
handles this just fine:
library(data.table)
fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
# Type ID NAME CONTENT RESPONSE GRADE SOURCE
#1: A 3 I have comma, ha! I have open double quotes" A
Upvotes: 10
Reputation: 844
I'm not too sure about the structure of CSV files, but you said the author had escaped the comma in the text under content.
This works to read the text as is with the "
at the end.
read.csv2("Test.csv", header = T,sep = ",", quote="")
Upvotes: 2