Ludrew
Ludrew

Reputation: 5

create increment index for each group, starting row conditional on other column

Let's assume I have a dataset similar to this:

library(tidyverse)

country <- c(rep("Germany", 9), rep("Greece", 9), rep("Turkey", 9), rep("Austria", 9))
date <- rep(seq(ymd('2012-04-07'),ymd('2012-04-15'),by='days'), 4)
gyros_eaten <- c(lag(1:3, n = 2, default = 0), floor(runif(6, min=1, max=4)),
                 lag(1:6, n = 5, default = 0), floor(runif(3, min=1, max=4)),
                 lag(1:2, n = 1, default = 0), floor(runif(7, min=1, max=4)),
                 lag(1:3, n = 2, default = 0), floor(runif(6, min=1, max=4)))
df <- data.frame(country, date, gyros_eaten)

I want to create a new column, name it days_since_first_local_gyros, which would tell me how many days have passed since eating the first gyros, in that particular country. That is, it would take the value of zero if no gyros was eaten yet, or if on that day I ate my first gyros in e.g. Greece. After that, it would run from 1 to infinity, increasing by 1 on each day. Hence each group (i.e. country) would have a distinct index, and I have rows for each possible country x day dyads in my dataset, within a particular time range.

How should I create this column? I assume it's some combination of case when, group_by and lag, but i just can't wrap my head around it.

Upvotes: 0

Views: 259

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

We could get the first date where gyros_eaten > 0 and subtract all the dates from that date in the country. We use pmax to keep max value of 0 in days_since_first_local_gyros.

library(dplyr)

df %>%
  arrange(country, date) %>%
  group_by(country) %>%
  mutate(days_since_first_local_gyros = pmax(as.integer(date - 
                            date[which.max(gyros_eaten > 0)]), 0))

#   country date       gyros_eaten days_since_first_local_gyros
#   <chr>   <date>           <dbl>                        <dbl>
# 1 Austria 2012-04-07           0                            0
# 2 Austria 2012-04-08           0                            0
# 3 Austria 2012-04-09           1                            0
# 4 Austria 2012-04-10           1                            1
# 5 Austria 2012-04-11           1                            2
# 6 Austria 2012-04-12           2                            3
# 7 Austria 2012-04-13           2                            4
# 8 Austria 2012-04-14           2                            5
# 9 Austria 2012-04-15           3                            6
#10 Germany 2012-04-07           0                            0
# … with 26 more rows

Upvotes: 1

Related Questions