Reputation: 647
I have a simple txt file: (values are in quotes and separated by tabs)
"Col1" "Col2" "Col3"
"A" "1,1" "C"
"B" "2,1" "C"
"C" "3,1" "C"
I would like to read the file using fread()
. Since the middle column should be numeric, I use dec = ","
.
However, the command:
fread("myFile.txt", sep = "\t", dec = ",", header = TRUE, stringsAsFactors = FALSE)
fails to read Col2 as numeric. Specifying colClasses = c("character", "numeric", "character")
does not make any difference.
Is there a way to accurately read the file using fread()
(without post-processing)?
Any help would be greatly appreciated
Upvotes: 1
Views: 4593
Reputation: 18602
I'm going to backtrack a little bit on my previous comments; it looks like read.table
does handle this situation successfully.
Demonstrating with the following object,
df <- data.frame(
Col1 = LETTERS[1:3],
Col2 = sub(".", ",", 1:3 + 0.1, fixed = TRUE),
Col3 = rep("C", 3),
stringsAsFactors = FALSE
)
which looks like this on disk:
write.table(
df,
sep = "\t",
row.names = FALSE
)
# "Col1" "Col2" "Col3"
# "A" "1,1" "C"
# "B" "2,1" "C"
# "C" "3,1" "C"
Writing this to a temporary file,
tf <- tempfile()
write.table(
df,
file = tf,
sep = "\t",
row.names = FALSE
)
read.table
will process the second column as numeric
when the proper arguments are provided:
str(read.table(tf, header = TRUE, sep = "\t", dec = ","))
# 'data.frame': 3 obs. of 3 variables:
# $ Col1: chr "A" "B" "C"
# $ Col2: num 1.1 2.1 3.1
# $ Col3: chr "C" "C" "C"
More conveniently, read.delim2
may be used also:
str(read.delim2(tf, header = TRUE))
# 'data.frame': 3 obs. of 3 variables:
# $ Col1: chr "A" "B" "C"
# $ Col2: num 1.1 2.1 3.1
# $ Col3: chr "C" "C" "C"
I can't really say why fread
does not handle this, but if it is a sufficiently common scenario the package maintainers may want to account for it. You might consider opening an issue ticket on the GitHub repository and inquiring about this.
Upvotes: 2