Reputation: 633
Consider the following data frame named mydata
.
id s1 s2 s3 t1 t2 t3
1 1 0 0 0 1 0
2 0 0 1 0 0 1
3 1 0 0 1 0 0
4 0 1 0 0 1 0
5 0 1 0 1 0 0
6 0 0 1 0 0 1
7 0 0 1 0 1 0
8 1 0 0 0 0 1
9 0 1 0 0 0 1
10 0 0 1 0 0 1
My intention is to get the conditional proportion for each t_i
given s_i
. For example, the conditional proportion for t1
given s1
is computed as: (no of s1==1 & t1==1)/(no of s1==1) = 1/3
. Thus, I want to repeat this for all possible combinations using for loop in R
.
Any help is highly appreciated. Tnx!
Upvotes: 0
Views: 184
Reputation: 269556
We show how to do this without looping by using matrix math and in a special case which does cover the sample input shown in the question using regression.
Get the s columns as a matrix mats and the t columns as a matrix matt. Then use the matrix expression shown and optionally add the row names.
nms <- names(mydata)
is <- startsWith(nms, "s")
it <- startsWith(nms, "t")
mats <- as.matrix(mydata[is])
matt <- as.matrix(mydata[it])
crossprod(mats, matt) / colSums(mats)
giving:
t1 t2 t3
s1 0.3333333 0.3333333 0.3333333
s2 0.3333333 0.3333333 0.3333333
s3 0.0000000 0.2500000 0.7500000
As a double check note that the s1/t1 cell in the above matrix is 1/3 as in the question.
In the question there is exactly one 1 in each row of the s columns and if that is the general case (in general we just need the columns of mats to be orthogonal) then the result can be obtained as the regression coefficients of the following regression:
coef( lm(cbind(t1, t2, t3) ~ s1 + s2 + s3 + 0, mydata))
giving:
t1 t2 t3
s1 3.333333e-01 0.3333333 0.3333333
s2 3.333333e-01 0.3333333 0.3333333
s3 5.551115e-17 0.2500000 0.7500000
or equivalently (except for slightly different row names):
coef(lm(matt ~ mats + 0))
or
solve(crossprod(mats), crossprod(mats, matt))
The input mydata in reproducible form is assumed to be:
Lines <- "
id s1 s2 s3 t1 t2 t3
1 1 0 0 0 1 0
2 0 0 1 0 0 1
3 1 0 0 1 0 0
4 0 1 0 0 1 0
5 0 1 0 1 0 0
6 0 0 1 0 0 1
7 0 0 1 0 1 0
8 1 0 0 0 0 1
9 0 1 0 0 0 1
10 0 0 1 0 0 1"
mydata <- read.table(text = Lines, header = TRUE)
Upvotes: 3
Reputation: 887108
We could use Map
Map(function(x, y) (x & y)/sum(y), mydata[startsWith(names(mydata), 't')],
mydata[startsWith(names(mydata), 's')])
Upvotes: 0