Bamqf
Bamqf

Reputation: 3542

Read csv file in R with double quotes

Suppose I have a csv file looks like this:

Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""

desired output should be:

df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!',
                 RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA)
df
  Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
1    A  3   NA I have comma, ha! I have open double quotes"     A     NA

I tried to use read.csv, since the data provider uses quote to escape comma in the string, but they forgot to escape double quotes in string with no comma, so no matter whether I disable quote in read.csv I won't get desired output.

How can I do this in R? Other package solutions are also welcome.

Upvotes: 12

Views: 29996

Answers (3)

A. Webb
A. Webb

Reputation: 26446

This is not valid CSV, so you'll have to do your own parsing. But, assuming the convention is as follows, you can just toggle with scan to take advantage of most of its abilities:

  1. If the field starts with a quote, it is quoted.
  2. If the field does not start with a quote, it is raw

next_field<-function(stream) {
  p<-seek(stream)
  d<-readChar(stream,1)
  seek(stream,p)
  if(d=="\"")    
    field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)   
  else
    field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
  return(field)
}

Assuming the above convention, this sufficient to parse as follows

s<-file("example.csv",open="rt")
header<-readLines(s,1)
header<-scan(what="",text=header,sep=",")
line<-replicate(length(header),next_field(s))

setNames(as.data.frame(lapply(line,type.convert)),header)
  Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
1    A  3   NA I have comma, ha! I have open double quotes"     A     NA

However, in practice you might want to first write back the fields, quoting each, to another file, so you can just read.csv on the corrected format.

Upvotes: 2

eddi
eddi

Reputation: 49448

fread from data.table handles this just fine:

library(data.table)

fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
#   Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
#1:    A  3      I have comma, ha! I have open double quotes"     A       

Upvotes: 10

Buzz Lightyear
Buzz Lightyear

Reputation: 844

I'm not too sure about the structure of CSV files, but you said the author had escaped the comma in the text under content.

This works to read the text as is with the " at the end.

read.csv2("Test.csv", header = T,sep = ",", quote="")

Upvotes: 2

Related Questions