Vivek
Vivek

Reputation: 107

get a listing of unique count of occurrences for a set of variables in a data.table

I want to get a listing of unique count of occurrences for a set of variables in a data.table. The following code works. I was just wondering if it is possible to generalise this with a function. It may then have more general applicability for data of moderate size.

packageVersion('data.table')
[1] ‘1.9.7’
library(data.table)
DT <- data.table(x1=c("b","c", "a", "b", "a", "b",'b'), x2=as.character(c(1:6,1)),m1=c(seq(10,60,by=10),10),m2=1:7)
DT
> DT
   x1 x2 m1 m2
1:  b  1 10  1
2:  c  2 20  2
3:  a  3 30  3
4:  b  4 40  4
5:  a  5 50  5
6:  b  6 60  6
7:  b  1 10  7

###get unique counts of each occurrence of first variable x1
setkey(DT,x1)
u1<-DT[,.(uN1=uniqueN(.SD)),by=x1]
U1<-u1[DT]
U1
> U1
   x1 uN1 x2 m1 m2
1:  a   2  3 30  3
2:  a   2  5 50  5
3:  b   4  1 10  1
4:  b   4  4 40  4
5:  b   4  6 60  6
6:  b   4  1 10  7
7:  c   1  2 20  2

###unique counts of (x1,x2)
setkey(U1,x1,x2)
u2<-U1[,.(uN2=uniqueN(.SD)),by=.(x1,x2)]
U2<-u2[U1]
U2
> U2
   x1 x2 uN2 uN1 m1 m2
1:  a  3   1   2 30  3
2:  a  5   1   2 50  5
3:  b  1   2   4 10  1
4:  b  1   2   4 10  7
5:  b  4   1   4 40  4
6:  b  6   1   4 60  6
7:  c  2   1   1 20  2

###unique counts of (x1,x2,m1)
setkey(U2,x1,x2,m1)
u3<-U2[,.(uN3=uniqueN(.SD)),by=.(x1,x2,m1)]
U3<-u3[U2]
# desired order
setcolorder(U3,c('x1','uN1','x2','uN2','m1','uN3','m2'))
U3
> U3
   x1 uN1 x2 uN2 m1 uN3 m2
1:  a   2  3   1 30   1  3
2:  a   2  5   1 50   1  5
3:  b   4  1   2 10   2  1
4:  b   4  1   2 10   2  7
5:  b   4  4   1 40   1  4
6:  b   4  6   1 60   1  6
7:  c   1  2   1 20   1  2

The proposed function can be something like:

UniqueCombN(DT, listX)

where listX is a list of the variables of interest in the data table.

Upvotes: 2

Views: 85

Answers (1)

Jaap
Jaap

Reputation: 83215

The several joins you used are not needed. You can achieve the same result by updating your data.table by reference with:

DT[, uN1:=uniqueN(.SD), x1
   ][, uN2:=uniqueN(.SD), .(x1,x2)
     ][, uN3:=uniqueN(.SD), .(x1,x2,m1)]

gives:

> DT
   x1 x2 m1 m2 uN1 uN2 uN3
1:  b  1 10  1   4   2   2
2:  c  2 20  2   1   1   1
3:  a  3 30  3   2   1   1
4:  b  4 40  4   4   1   1
5:  a  5 50  5   2   1   1
6:  b  6 60  6   4   1   1
7:  b  1 10  7   4   2   2

If you want to set the order, you can use for example:

setorder(DT, x1, x2)

which gives:

> DT
   x1 x2 m1 m2 uN1 uN2 uN3
1:  a  3 30  3   2   1   1
2:  a  5 50  5   2   1   1
3:  b  1 10  1   4   2   2
4:  b  1 10  7   4   2   2
5:  b  4 40  4   4   1   1
6:  b  6 60  6   4   1   1
7:  c  2 20  2   1   1   1

Arranging the columns in a different order can be done like you did in your question:

setcolorder(DT, c('x1','uN1','x2','uN2','m1','uN3','m2'))

Upvotes: 2

Related Questions