FooBar
FooBar

Reputation: 16508

R: Warning when creating a (long) list of dummies

A dummy column for a column c and a given value x equals 1 if c==x and 0 else. Usually, by creating dummies for a column c, one excludes one value x at choice, as the last dummy column doesn't add any information w.r.t. the already existing dummy columns.

Here's how I'm trying to create a long list of dummies for a column firm, in a data.table:

values <- unique(myDataTable$firm)
cols <- paste('d',as.character(inds[-1]), sep='_') # gives us nice d_value names for columns
# the [-1]: I arbitrarily do not create a dummy for the first unique value
myDataTable[, (cols):=lapply(values[-1],function(x)firm==x)]

This code reliably worked for previous columns, which had smaller unique values. firm however is larger:

tr(values)
 num [1:3082] 51560090 51570615 51603870 51604677 51606085 ...

I get a warning when trying to add the columns:

Warning message:
  truelength (6198) is greater than 1000 items over-allocated (length = 36). See ?truelength. If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().

As far as I can tell, there is still all columns that I need. Can I just ignore this issue? Will it slow down future computations? I'm not sure what to make of this and the relevant of truelength.

Upvotes: 5

Views: 1011

Answers (1)

jangorecki
jangorecki

Reputation: 16697

Taking Arun's comment as an answer.
You should use alloc.col function to pre-allocate required amount of columns in your data.table to the number which will be bigger than expected ncol.

alloc.col(myDataTable, 3200)

Additionally depending on the way how you consume the data I would recommend to consider reshaping your wide table to long table, see EAV. Then you need to have only one column per data type.

Upvotes: 4

Related Questions