Andreas
Andreas

Reputation: 6738

How to count occurrences of different sets

sample code:

library(data.table)
set.seed(42)
dt <- data.table(id = LETTERS[1:20],
             setvalues = replicate(20,
                sample(letters[1:4], sample(c(2,3),1))))[order(id)]

dt

id setvalues 1: A d,a,b 2: B c,d,a 3: C c,b,d 4: D b,d,c 5: E a,b,c 6: F a,c,b 7: G c,b 8: H b,c,d 9: I b,c,a 10: J a,d,b 11: K b,d,a 12: L b,c,d 13: M d,b,a 14: N b,c 15: O c,d 16: P b,d 17: Q d,c,b 18: R a,d,b 19: S a,d,c 20: T b,a

How can count the occurence of each set (order doesn't matter).

The desired results are something like

setvalue counts b,c,d 6 a,b,d 4 a,c,c 3 a,c,d 2 b,c 2 c,d 1 b,d 1 a,b 1

Upvotes: 1

Views: 41

Answers (1)

akrun
akrun

Reputation: 887651

The 'setvalues' is a list of vector. We loop through the list with lapply, sort it, paste, use it in the by argument and get the 'counts' with .N

dt[ , .(counts = .N), .(setvalue = unlist(lapply(setvalues, function(x) toString(sort(x)))))]

Upvotes: 2

Related Questions