ego_
ego_

Reputation: 1491

Create new column in data.table by group

I have no experience with data.table, so I don't know if there is a solution to my question (30 minutes on Google gave no answer at least), but here it goes.

With data.frame I often use the following command to check the number of observations of a unique value:

df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))  

Is there any corresponding method when working with data.table?

Upvotes: 4

Views: 4171

Answers (1)

Matt Dowle
Matt Dowle

Reputation: 59612

Yes, there is. Happily, you've asked about one of the newest features of data.table, added in v1.8.2 :

:= by group is now implemented (FR#1491) and sub-assigning to a new column by reference now adds the column automatically (initialized with NA where the sub-assign doesn't touch) (FR#1997). := by group can be combined with all types of i, so := by group includes grouping by i as well as by by. Since := by group is by reference, it should be significantly faster than any method that (directly or indirectly) cbinds the grouped results to DT, since no copy of the (large) DT is made at all. It's a short and natural syntax that can be compounded with other queries.
DT[,newcol:=sum(colB),by=colA]

In your example, iiuc, it should be something like :

DT[, Obs:=.N, by=ID-Date]

instead of :

df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))

Note that := by group scales well for large data sets (and smaller datasets will a lot of small groups).

See ?":=" and Search data.table tag for "reference"

Upvotes: 5

Related Questions