nicshah
nicshah

Reputation: 345

fread in data.table but too many columns

I've recently updated my data.table package, and I'm struggling with the fread() function. Previously, (version 1.10.4-3) if I used fread() it could delimit the data I was reading in into columns. The newer version can't do this, and if I use the fill = TRUE tag it chucks it all into onto column.

The issue is that it detects there are 13 column names, but the new version of data.table tries to fill the other columns. Is there a way to still do this?

This is my input, but new data.table can no longer delimit the columns correctly.

I'm aware there might be other packages that might do this, but I'd prefer to use data.table if possible.

    id, value, other
    1, "("a"="b", "b"="c", "c"="d")", 2
    2, "("a"="b", "b"="c", "c"="d")", 3

Upvotes: 1

Views: 915

Answers (2)

Carlos Eduardo Lagosta
Carlos Eduardo Lagosta

Reputation: 1001

There have been changes to the quote rules from version 1.10.6. They're now more robust and have better performance, but will not handle unbalanced quotes and other cases. Check the details for quotes on the current documentation of fread.

As alternative, you can use functions that use scan to handle quotes inside quotes, like read.table:

read.table("example.txt", sep = ",", header = TRUE)

Or, as answered by @jared-mamrot, use vroom for better performance, converting later to a data.table with setDT

Upvotes: 2

jared_mamrot
jared_mamrot

Reputation: 26225

Vroom handles your test-case without adding additional columns, e.g.

library(vroom)
test <- vroom(file = "test.txt")
test
# A tibble: 2 x 3
     id value           other
  <dbl> <chr>           <dbl>
1     1 (a=b, b=c, c=d)     2
2     2 (a=b, b=c, c=d)     3

EDIT

To keep the quotation marks:

library(vroom)
test <- vroom(file = "test.txt", escape_double = FALSE)
test
# A tibble: 2 x 3
     id value                                     other
  <dbl> <chr>                                     <dbl>
1     1 ("a"="b", "b"="c", "c"="d")     2
2     2 ("a"="b", "b"="c", "c"="d")     3

Upvotes: 5

Related Questions