fread in data.table but too many columns

Question

I've recently updated my data.table package, and I'm struggling with the fread() function. Previously, (version 1.10.4-3) if I used fread() it could delimit the data I was reading in into columns. The newer version can't do this, and if I use the fill = TRUE tag it chucks it all into onto column.

The issue is that it detects there are 13 column names, but the new version of data.table tries to fill the other columns. Is there a way to still do this?

This is my input, but new data.table can no longer delimit the columns correctly.

I'm aware there might be other packages that might do this, but I'd prefer to use data.table if possible.

    id, value, other
    1, "("a"="b", "b"="c", "c"="d")", 2
    2, "("a"="b", "b"="c", "c"="d")", 3

Carlos Eduardo Lagosta · Accepted Answer

There have been changes to the quote rules from version 1.10.6. They're now more robust and have better performance, but will not handle unbalanced quotes and other cases. Check the details for quotes on the current documentation of fread.

As alternative, you can use functions that use scan to handle quotes inside quotes, like read.table:

read.table("example.txt", sep = ",", header = TRUE)

Or, as answered by @jared-mamrot, use vroom for better performance, converting later to a data.table with setDT

fread in data.table but too many columns

Answers (2)

Related Questions