Reputation: 345
I've recently updated my data.table package, and I'm struggling with the fread()
function. Previously, (version 1.10.4-3) if I used fread()
it could delimit the data I was reading in into columns. The newer version can't do this, and if I use the fill = TRUE
tag it chucks it all into onto column.
The issue is that it detects there are 13 column names, but the new version of data.table tries to fill the other columns. Is there a way to still do this?
This is my input, but new data.table can no longer delimit the columns correctly.
I'm aware there might be other packages that might do this, but I'd prefer to use data.table if possible.
id, value, other
1, "("a"="b", "b"="c", "c"="d")", 2
2, "("a"="b", "b"="c", "c"="d")", 3
Upvotes: 1
Views: 915
Reputation: 1001
There have been changes to the quote rules from version 1.10.6. They're now more robust and have better performance, but will not handle unbalanced quotes and other cases. Check the details for quotes on the current documentation of fread.
As alternative, you can use functions that use scan
to handle quotes inside quotes, like read.table
:
read.table("example.txt", sep = ",", header = TRUE)
Or, as answered by @jared-mamrot, use vroom
for better performance, converting later to a data.table with setDT
Upvotes: 2
Reputation: 26225
Vroom handles your test-case without adding additional columns, e.g.
library(vroom)
test <- vroom(file = "test.txt")
test
# A tibble: 2 x 3
id value other
<dbl> <chr> <dbl>
1 1 (a=b, b=c, c=d) 2
2 2 (a=b, b=c, c=d) 3
EDIT
To keep the quotation marks:
library(vroom)
test <- vroom(file = "test.txt", escape_double = FALSE)
test
# A tibble: 2 x 3
id value other
<dbl> <chr> <dbl>
1 1 ("a"="b", "b"="c", "c"="d") 2
2 2 ("a"="b", "b"="c", "c"="d") 3
Upvotes: 5