Reputation: 16508
A dummy column for a column c
and a given value x
equals 1
if c==x
and 0 else. Usually, by creating dummies for a column c
, one excludes one value x
at choice, as the last dummy column doesn't add any information w.r.t. the already existing dummy columns.
Here's how I'm trying to create a long list of dummies for a column firm
, in a data.table
:
values <- unique(myDataTable$firm)
cols <- paste('d',as.character(inds[-1]), sep='_') # gives us nice d_value names for columns
# the [-1]: I arbitrarily do not create a dummy for the first unique value
myDataTable[, (cols):=lapply(values[-1],function(x)firm==x)]
This code reliably worked for previous columns, which had smaller unique values. firm
however is larger:
tr(values)
num [1:3082] 51560090 51570615 51603870 51604677 51606085 ...
I get a warning when trying to add the columns:
Warning message:
truelength (6198) is greater than 1000 items over-allocated (length = 36). See ?truelength. If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().
As far as I can tell, there is still all columns that I need. Can I just ignore this issue? Will it slow down future computations? I'm not sure what to make of this and the relevant of truelength
.
Upvotes: 5
Views: 1011
Reputation: 16697
Taking Arun's comment as an answer.
You should use alloc.col
function to pre-allocate required amount of columns in your data.table to the number which will be bigger than expected ncol.
alloc.col(myDataTable, 3200)
Additionally depending on the way how you consume the data I would recommend to consider reshaping your wide table to long table, see EAV. Then you need to have only one column per data type.
Upvotes: 4