Reputation: 539
I have a data frame:
Y X1 X2 X3
1 1 0 1
1 0 1 1
0 1 0 1
0 0 0 1
1 1 1 0
0 1 1 0
I want sum over all rows in Y
column based on other columns that equal to 1
, which is sum(Y=1|Xi =1
). For example, for column X1
, s1 = sum(Y=1|Xi =1) =1 + 0 +1+0 =2
Y X1
1 1
0 1
1 1
0 1
For X2
column, the s2 = sum(Y=1|Xi =1) = 0 +1+0 =1
Y X2
0 1
1 1
0 1
For X3
column, the s3 = sum(Y=1|Xi =1) = 1+1 +0+0 =2
Y X3
1 1
1 1
0 1
0 1
I have a rough idea to use apply(df, 2, sum)
for the column of the dataframe, but I have no idea how to subset each column based on Xi
, then calculate the sum
of Y.
Any help is appreciated!
Upvotes: 5
Views: 8870
Reputation: 28825
There are numerous ways to do this. One is getting a subset based on the column you want:
sum(df[df$X1==1,]$Y)
This should work for you.
Upvotes: 6
Reputation: 32548
Here's one more approach that you could modify to sum elements corresponding to 1, 0, or some other value.
sapply(x[,-1], function(a) sum(x$Y[a == 1]))
#X1 X2 X3
# 2 2 2
Upvotes: 2
Reputation: 16277
You can use colSums
and count when Y*X is equal to 1. I think there's an error in your desired output for X2 column. Row 2 and 5 contain 1 for Y and X2. The sum should be 2.
x=read.table(text="Y X1 X2 X3
1 1 0 1
1 0 1 1
0 1 0 1
0 0 0 1
1 1 1 0
0 1 1 0",header=TRUE, stringsAsFactors=FALSE)
colSums(x[,-1]*x[,1])
X1 X2 X3
2 2 2
You can also use crossprod(x[,1],as.matrix(x[,-1]))
X1 X2 X3
[1,] 2 2 2
Upvotes: 4