Hristo Hristov
Hristo Hristov

Reputation: 35

determine outcome based on previous values in a grouped dataframe

I have a dataframe with the following format:

pair group group_rank win_prob
<int> <int>   <chr>    <dbl>
 1     1      first     0.6
 1     2      second    0.4
 2     3      first     0.5
 2     4      second    0.5

It has been produced with the following code snippet:

library(tidyverse)

df <- tibble(pair = rep(c("A", "B"), each = 2),
            group = seq(1:4),
            group_rank = c("first", "second", "first", "second"),
            win_prob = c(0.6, 0.4, 0.5, 0.5))

My goal is to assign "win" to one group in each pair and "loss" to the other group. In other words, I want to produce the following dataframe with a new column outcome:

pair group group_rank win_prob outcome
<int> <int>    <chr>      <dbl> <chr>  
  1     1     first       0.6   win    
  1     2     second      0.4   loss   
  2     3     first       0.5   loss   
  2     4     second      0.5   win

The assignment of "win" or "loss" to the outcome variable should be based on the group_rank and the corresponding value in the win_prob variable. More specifically, each time I want first to check whether the group with group_rank == "first" has won, by checking whether its win_prob >= runif(1) (a Bernoulli trail).

If the condition is satisfied, I want to assign "win" to this group. If the condition is not satisfied, I want to assign "loss".

After I have determined whether the group with group_rank == "first" has won or not, I want to assign the opposite outcome to the group with group_rank == "second". Therefore, if the "first" group has been assigned "win", the second group should be assigned "loss" and vice versa.

In pseudo-code, it should be something like this, but the trick is how to look within a grouped dataframe for the outcome of "first" group, while determining the outcome of the "second" group:

for pair in pairs: 
    if group_rank == ``first'' and win_prob >= runif(1):
        outcome <- ``win''
    else:
        outcome <- ``loss''

    if group_rank == ``second'':
        if outcome == ``win'' for group with group_rank == ``first'':
            outcome <- ``loss''
        else:
            outcome <- ``win''

Is there a simple way to achieve this within the tidyverse framework?

Upvotes: 1

Views: 50

Answers (1)

s_baldur
s_baldur

Reputation: 33498

Using data.table one could do this:

res <- c("win", "lose") # Not a good name but this is one of two possible results.
setDT(df)[, 
          outcome := {
            temp = win_prob[1] >= runif(1); 
            ifelse(c(temp, temp), res, rev(res))
          }, 
          by = pair]
df
   pair group group_rank win_prob outcome
1:    A     1      first      0.6     win
2:    A     2     second      0.4    lose
3:    B     3      first      0.5    lose
4:    B     4     second      0.5     win

Using dplyr:

df %>%
  group_by(pair) %>%
  mutate(temp = win_prob[1] >= runif(1)) %>%
  mutate(outcome = ifelse(temp, res, rev(res))) %>%
  select(-temp)

NOTE:
Both solutions assume data is already sorted so that for each pair, group_rank first is always appears above.

Upvotes: 2

Related Questions