Gio Circo
Gio Circo

Reputation: 352

Sum of unique column combinions

I am attempting to do two things to a dataset I currently have:

    ID    IV1 DV1 DV2 DV3 DV4 DV5 DV6 DV7
1  97330   3   0   0   0   0   0   1   0
2 118619   0   0   0   0   0   1   1   0
3 101623   2   0   0   0   0   0   0   0
4 202626   0   0   0   0   0   0   0   0
5 182925   1   1   0   0   0   0   0   0
6 179278   1   0   0   0   0   0   0   0
  1. Find the unique number of column combinations of 7 binary independent variables (DV1 - DV7)
  2. Find the sum of an independent count variable (IV1) by each unique group.

I have been able to determine the number of unique column combinations by using the following:

uniq <- unique(dat[,c('DV1','DV2','DV3','DV4','DV5','DV6','DV7')])

This indicates there are 101 unique combinations present in the dataset. What I haven't been able to figure out is how to determine how to sum the variable "IV1" by each unique group. I've been reading around on this site, and I'm fairly certain there is an easy dplyr answer to this, but it's eluded me so far.

NOTE: I'm essentially trying to find an R solution to perform a "conjunctive analysis" which is displayed in this paper. There is sample code for SPSS, SAS and STATA at the end of the paper.

Upvotes: 0

Views: 110

Answers (2)

Chris
Chris

Reputation: 6372

Here's a reproducible example:

library(data.table)
DT <- data.table(X = c(1, 1, 1 , 1), Y = c(2, 2 , 3 , 4), Z = c(1,1,3,1))

Where X, Y ... are your columns.

Then use the Reduce function:

DT[, join_grp := Reduce(paste,list(X,Y,Z))]

This gives:

DT
   X Y Z join_grp
1: 1 2 1    1 2 1
2: 1 2 1    1 2 1
3: 1 3 3    1 3 3
4: 1 4 1    1 4 1

And we can find:

unique(DT[, join_grp])
[1] "1 2 1" "1 3 3" "1 4 1"

For the sums:

DT[ , sum(X), by = join_grp]

Just put whatever column you want to sum in place of the X

Concise Solution

DT[, join_grp := Reduce(paste,list(X,Y,Z))][ , sum(X), by = join_grp]

or

DT[ , sum(X), by = list(Reduce(paste,list(X,Y,Z)))]

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145785

library(dplyr)
group_by(dat, DV1, DV2, DV3, DV4, DV5, DV6, DV7) %>%
    summarize(sumIV1 = sum(IV1))

The number of rows in the result is the number of unique combinations present in your data. The sumIV1 column, of course, has the group-wise sum of IV1.

Thanks to Frank in the comments, we can use strings with group_by_ to simplify:

group_by_(dat, .dots = paste0("DV", 1:7)) %>%
         summarize(sumIV1 = sum(IV1))

Upvotes: 3

Related Questions