Reputation: 1491
I have no experience with data.table, so I don't know if there is a solution to my question (30 minutes on Google gave no answer at least), but here it goes.
With data.frame I often use the following command to check the number of observations of a unique value:
df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))
Is there any corresponding method when working with data.table?
Upvotes: 4
Views: 4171
Reputation: 59612
Yes, there is. Happily, you've asked about one of the newest features of data.table
, added in v1.8.2 :
:=
by group is now implemented (FR#1491) and sub-assigning to a new column by reference now adds the column automatically (initialized withNA
where the sub-assign doesn't touch) (FR#1997).:=
by group can be combined with all types ofi
, so:=
by group includes grouping byi
as well as byby
. Since:=
by group is by reference, it should be significantly faster than any method that (directly or indirectly)cbind
s the grouped results to DT, since no copy of the (large) DT is made at all. It's a short and natural syntax that can be compounded with other queries.
DT[,newcol:=sum(colB),by=colA]
In your example, iiuc, it should be something like :
DT[, Obs:=.N, by=ID-Date]
instead of :
df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))
Note that :=
by group scales well for large data sets (and smaller datasets will a lot of small groups).
See ?":="
and Search data.table tag for "reference"
Upvotes: 5