Jassy.W
Jassy.W

Reputation: 539

Calculate sum of one column based on another column

I have a data frame:

Y  X1  X2  X3
1   1   0  1
1   0   1  1
0   1   0  1
0   0   0  1
1   1   1  0
0   1   1  0

I want sum over all rows in Y column based on other columns that equal to 1, which is sum(Y=1|Xi =1). For example, for column X1, s1 = sum(Y=1|Xi =1) =1 + 0 +1+0 =2

Y  X1   
1   1   

0   1    

1   1    
0   1   

For X2 column, the s2 = sum(Y=1|Xi =1) = 0 +1+0 =1

    Y   X2  

    0   1   

    1   1    
    0   1     

For X3 column, the s3 = sum(Y=1|Xi =1) = 1+1 +0+0 =2

    Y    X3
    1    1
    1    1
    0    1
    0    1

I have a rough idea to use apply(df, 2, sum) for the column of the dataframe, but I have no idea how to subset each column based on Xi, then calculate the sum of Y. Any help is appreciated!

Upvotes: 5

Views: 8870

Answers (3)

M--
M--

Reputation: 28825

There are numerous ways to do this. One is getting a subset based on the column you want:

sum(df[df$X1==1,]$Y)

This should work for you.

Upvotes: 6

d.b
d.b

Reputation: 32548

Here's one more approach that you could modify to sum elements corresponding to 1, 0, or some other value.

sapply(x[,-1], function(a) sum(x$Y[a == 1]))
#X1 X2 X3 
# 2  2  2 

Upvotes: 2

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

You can use colSums and count when Y*X is equal to 1. I think there's an error in your desired output for X2 column. Row 2 and 5 contain 1 for Y and X2. The sum should be 2.

x=read.table(text="Y  X1  X2  X3
1   1   0  1
1   0   1  1
0   1   0  1
0   0   0  1
1   1   1  0
0   1   1  0",header=TRUE, stringsAsFactors=FALSE)

colSums(x[,-1]*x[,1])

X1 X2 X3 
 2  2  2

You can also use crossprod(x[,1],as.matrix(x[,-1]))

     X1 X2 X3
[1,]  2  2  2

Upvotes: 4

Related Questions