Reputation: 35
I have a dataframe with the following format:
pair group group_rank win_prob
<int> <int> <chr> <dbl>
1 1 first 0.6
1 2 second 0.4
2 3 first 0.5
2 4 second 0.5
It has been produced with the following code snippet:
library(tidyverse)
df <- tibble(pair = rep(c("A", "B"), each = 2),
group = seq(1:4),
group_rank = c("first", "second", "first", "second"),
win_prob = c(0.6, 0.4, 0.5, 0.5))
My goal is to assign "win" to one group in each pair and "loss" to the other group. In other words, I want to produce the following dataframe with a new column outcome:
pair group group_rank win_prob outcome
<int> <int> <chr> <dbl> <chr>
1 1 first 0.6 win
1 2 second 0.4 loss
2 3 first 0.5 loss
2 4 second 0.5 win
The assignment of "win" or "loss" to the outcome variable should be based on the group_rank and the corresponding value in the win_prob variable. More specifically, each time I want first to check whether the group with group_rank == "first" has won, by checking whether its win_prob >= runif(1) (a Bernoulli trail).
If the condition is satisfied, I want to assign "win" to this group. If the condition is not satisfied, I want to assign "loss".
After I have determined whether the group with group_rank == "first" has won or not, I want to assign the opposite outcome to the group with group_rank == "second". Therefore, if the "first" group has been assigned "win", the second group should be assigned "loss" and vice versa.
In pseudo-code, it should be something like this, but the trick is how to look within a grouped dataframe for the outcome of "first" group, while determining the outcome of the "second" group:
for pair in pairs:
if group_rank == ``first'' and win_prob >= runif(1):
outcome <- ``win''
else:
outcome <- ``loss''
if group_rank == ``second'':
if outcome == ``win'' for group with group_rank == ``first'':
outcome <- ``loss''
else:
outcome <- ``win''
Is there a simple way to achieve this within the tidyverse framework?
Upvotes: 1
Views: 50
Reputation: 33498
Using data.table
one could do this:
res <- c("win", "lose") # Not a good name but this is one of two possible results.
setDT(df)[,
outcome := {
temp = win_prob[1] >= runif(1);
ifelse(c(temp, temp), res, rev(res))
},
by = pair]
df
pair group group_rank win_prob outcome
1: A 1 first 0.6 win
2: A 2 second 0.4 lose
3: B 3 first 0.5 lose
4: B 4 second 0.5 win
Using dplyr
:
df %>%
group_by(pair) %>%
mutate(temp = win_prob[1] >= runif(1)) %>%
mutate(outcome = ifelse(temp, res, rev(res))) %>%
select(-temp)
NOTE:
Both solutions assume data is already sorted so that for each pair, group_rank first is always appears above.
Upvotes: 2