Reputation: 165
I am trying to create a variable for a data frame in which I count the number of observations between two observations which meet a criteria. Here it is counting the number of times since last win in a game.
Say I have a dataframe like this:
df <- data.frame(player = c(10,10,10,10,10,10,10,10,10,10,10),win = c(1,0,0,0,1,1,0,1,0,0,1))
I want to create a new variable that counts the number of games it has been since the player has won. Summarized in a vector, the result should be (setting a Not Applicable for the first observation):
c(NA,0,1,2,3,0,0,1,0,1,2)
I want to be able to do this easily and create it as a variable in the data.frame using dplyr (or any other suitable approach)
Upvotes: 0
Views: 541
Reputation: 5898
With {tidyverse}, try:
library(tidyverse)
df <- data.frame(player = c(10,10,10,10,10,10,10,10,10,10,10),
win = c(1,0,0,0,1,1,0,1,0,0,1))
df %>%
group_by(player, group = cumsum(win != lag(win, default = first(win)))) %>%
mutate(counter = row_number(),
counter = if_else(win == 1, true = 0L, false = counter)) %>%
ungroup() %>%
group_by(player) %>%
mutate(counter = if_else(row_number() == 1, true = NA_integer_, false = counter)) %>%
ungroup() %>%
select(-group)
player win counter
<dbl> <dbl> <int>
1 10 1 NA
2 10 0 1
3 10 0 2
4 10 0 3
5 10 1 0
6 10 1 0
7 10 0 1
8 10 1 0
9 10 0 1
10 10 0 2
11 10 1 0
Upvotes: 1
Reputation: 902
I am not quite sure why the first value should be NA. Because the elapsed time is 0 since the last "win" and not NA.
For purely logical reasons, I would take the following approach:
seq = with(df, ave(win, cumsum(win == 1), FUN = seq_along)-1)
So you get the past cummulated sum games since the last win as follows:
c(0,1,2,3,0,0,1,0,1,2,0)
But if you still aim for your described result with a little data handling you can achieve it with this:
append(NA, seq[1:length(seq)-1])
It is not nice, but it works ;)
Upvotes: 1