Reputation: 4229
Here is sample data:
df <- data.frame(group=rep(1:5,rep(2,5)),value=c(0,-150,0,50,0,-120,0,30,0,-20),flag1=floor(runif(10)),flag2=rep(rbinom(5,1,.5),rep(2,5)),flag3=rep(rbinom(5,1,.5),rep(2,5)))
Each group starts with 0
value and the second row per group is the terminal value, this can be >0 or 0<
.
For example group 1:
group value flag1 flag2 flag3
1 0 0 0 0
1 -150 0 0 0
I would like to find out which combination of variables values flag1-flag3
results to negative value
and which to positive. This example above would indicate that having all 0
flag1-flag3
at state 0 (row 1) would result to negative value
= outcome (row 2). I would like to obtain the association per group and overall.
Upvotes: 1
Views: 46
Reputation: 11514
Consider the following as an example. I group by all possible values of flag1-flag3
and calculate the probability distribution for positive or negative values for each group.
library(dplyr)
# remove redundant rows:
df <- df %>% filter(value != 0)
# get all combinations of flat1-flag3 by grouping them,
# and then calculate the distribution:
df %>% group_by(flag1, flag2, flag3) %>% summarise(pos = mean(value > 0),
neg = mean(value < 0))
Source: local data frame [4 x 5]
Groups: flag1, flag2 [?]
flag1 flag2 flag3 pos neg
<dbl> <int> <int> <dbl> <dbl>
1 0 0 0 0.0 1.0
2 0 0 1 0.5 0.5
3 0 1 0 1.0 0.0
4 0 1 1 0.0 1.0
If you are more looking for regression coefficients, you would probably want to do something like
lm(value > 0 ~ flag1 + flag2 + flag3, data = df)
I am not sure this is what you were asking for, though. Just add it in case...
Just to point it out, you could get the above with the built-in function ftable
, but I usually prefer dplyr
as it returns a tibble, which is easy to work with.
Upvotes: 2