Maximilian
Maximilian

Reputation: 4229

Issue with double quotes and fread function

I have some column entries that look like this:

c("This is just a "shame"...") # since its a character

THIS WILL WRITE A FILE ON YOUR C:\ DRIVE:

sample.data <- data.frame(case1=c("This is just a 'shame'..."), 
                          case2="This is just a shame") # here I could not make it to insert the double quotes 
write.csv(sample.data, file="C:/sample_data.csv")

require(data.table)
test.fread <- fread("C:/sample_data.csv")
test.read.csv <- read.csv("C:/sample_data.csv")

If I read the csv data with fread function (from data.table), I get his error:

Bumped column 79 to type character on data row 12681, field contains '   
a.n."'. Coercing previously read values in this column from logical, 
integer or numeric back to character which may not be lossless; e.g., if 
'00' and '000' occurred before they will now be just '0', and there 
may be inconsistencies with treatment of ',,' and ',NA,' too (if they 
occurred in this column before the bump). If this matters please rerun 
and set 'colClasses' to 'character' for this column. Please note that column
type detection uses the first 5 rows, the middle 5 rows and the 
last 5 rows, so hopefully this message should be very rare. 
If reporting to datatable-help, please rerun and include 
the output from verbose=TRUE.

If I use read.csv no error occurs and the entries are read in correctly!

Question 1: How can I remove the double quotes inside the character name.

Question 2: Why read.csv reads the entries correctly but fread fails?

Upvotes: 2

Views: 1189

Answers (1)

Maximilian
Maximilian

Reputation: 4229

As @Arun kindly suggested, the data.table development version 1.9.5 currently on github may be of help here.

To install please follow this procedure (Rtools required):

# To install development version

library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)

It has been tested so this is to confirm that the newest version of data.table solves the issue with double quotes without problems.

For further details and updates check the following link github data.table

Upvotes: 2

Related Questions