Reputation: 502
I have a data.table
with data grouped by 2 columns key1
and key2
. What I would like to achieve is to get a single integer key enumerating over all present pairs of (key1, key2)
. E.g. to a dataset that looks like this:
key1 key2 data
A 1 ...
A 1 ...
A 2 ...
C 1 ...
B 1 ...
B 1 ...
C 1 ...
I would like to have an id
column added that would enumerate things:
id key1 key2 data
1 A 1 ...
1 A 1 ...
2 A 2 ...
3 C 1 ...
4 B 1 ...
4 B 1 ...
3 C 1 ...
I don't care the order in which id
s are distributed, I only need to be able to subset the table based on single number, not a pair of key columns. How can this be achieved?
Upvotes: 0
Views: 254
Reputation: 34733
There are a couple of options.
You could do either:
dt[unique(dt,by=c("key1","key2"))[,id:=.I],id:=i.id,on=c("key1","key2")]
Or (as gracefully pointed out by @Khashaa in the comments; in fact, this is probably the way favored by @Arun, given the function was created for cases just like this)
setkey(dt,key1,key2)[,id:=rleid(key1,key2)]
Note that the output from the second will be different from the first, in general--the latter will be sorted by key1
and key2
; this isn't necessarily the case for the former. This may or may not be more efficient, depending on your data. If you want to force the first case to be sorted, simply amend to [order(key1,key2),id:=.I]
Upvotes: 2