Reputation: 6738
sample code:
library(data.table)
set.seed(42)
dt <- data.table(id = LETTERS[1:20],
setvalues = replicate(20,
sample(letters[1:4], sample(c(2,3),1))))[order(id)]
dt
id setvalues
1: A d,a,b
2: B c,d,a
3: C c,b,d
4: D b,d,c
5: E a,b,c
6: F a,c,b
7: G c,b
8: H b,c,d
9: I b,c,a
10: J a,d,b
11: K b,d,a
12: L b,c,d
13: M d,b,a
14: N b,c
15: O c,d
16: P b,d
17: Q d,c,b
18: R a,d,b
19: S a,d,c
20: T b,a
How can count the occurence of each set (order doesn't matter).
The desired results are something like
setvalue counts
b,c,d 6
a,b,d 4
a,c,c 3
a,c,d 2
b,c 2
c,d 1
b,d 1
a,b 1
Upvotes: 1
Views: 41
Reputation: 887651
The 'setvalues' is a list
of vector
. We loop through the list
with lapply
, sort
it, paste
, use it in the by
argument and get the 'counts' with .N
dt[ , .(counts = .N), .(setvalue = unlist(lapply(setvalues, function(x) toString(sort(x)))))]
Upvotes: 2