Read quoted values in txt file with data.table::fread()

I have a simple txt file: (values are in quotes and separated by tabs)

"Col1" "Col2" "Col3"  
"A" "1,1" "C"  
"B" "2,1" "C"  
"C" "3,1" "C"  

I would like to read the file using fread(). Since the middle column should be numeric, I use dec = ",".

However, the command:

fread("myFile.txt", sep = "\t", dec = ",", header = TRUE, stringsAsFactors = FALSE)

fails to read Col2 as numeric. Specifying colClasses = c("character", "numeric", "character") does not make any difference.

Is there a way to accurately read the file using fread() (without post-processing)?

Any help would be greatly appreciated

Upvotes: 1

Views: 4593

Answers (1)

nrussell
nrussell

Reputation: 18602

I'm going to backtrack a little bit on my previous comments; it looks like read.table does handle this situation successfully.

Demonstrating with the following object,

df <- data.frame(
    Col1 = LETTERS[1:3], 
    Col2 = sub(".", ",", 1:3 + 0.1, fixed = TRUE), 
    Col3 = rep("C", 3), 
    stringsAsFactors = FALSE
)

which looks like this on disk:

write.table(
    df,
    sep = "\t", 
    row.names = FALSE
)
# "Col1"    "Col2"  "Col3"
# "A"   "1,1"   "C"
# "B"   "2,1"   "C"
# "C"   "3,1"   "C"

Writing this to a temporary file,

tf <- tempfile()
write.table(
    df,
    file = tf,
    sep = "\t", 
    row.names = FALSE
)

read.table will process the second column as numeric when the proper arguments are provided:

str(read.table(tf, header = TRUE, sep = "\t", dec = ","))
# 'data.frame': 3 obs. of  3 variables:
#  $ Col1: chr  "A" "B" "C"
#  $ Col2: num  1.1 2.1 3.1
#  $ Col3: chr  "C" "C" "C"

More conveniently, read.delim2 may be used also:

str(read.delim2(tf, header = TRUE))
# 'data.frame': 3 obs. of  3 variables:
#  $ Col1: chr  "A" "B" "C"
#  $ Col2: num  1.1 2.1 3.1
#  $ Col3: chr  "C" "C" "C"

I can't really say why fread does not handle this, but if it is a sufficiently common scenario the package maintainers may want to account for it. You might consider opening an issue ticket on the GitHub repository and inquiring about this.

Upvotes: 2

Related Questions