Reputation: 3
Team - thank you for your help. FYI - I spent the last few hours trying to find a similar request that doesn't use for_loops but I can't find one.
I have a large data set and using a for
loop is time restrictive. I am attempting to build a count down column which provides me the number of days till the next event. The start of the next "vacation" event is 0. When working backwards it should increase. Below is are two datasets current and desired. I obviously filled in the desired$days_till by hand. Thank you for any tips / help.
#### Current data frame######
data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"),
date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"),
days_till = "")
#### Desired data frame######
desired <- data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"),
date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"),
days_till = c(3,2,1,0,4,3,2,1,0))
Upvotes: 0
Views: 191
Reputation: 2268
The following is an answer based on tidyverse
packages. In general, I recommend to work with the appropriate data-types. I.e. make your date variable a date-type
. This way you can tap into a variety of useful functions (or get appropriate dimensioned outputs).
The approach you can use is as follows:
library(dplyr) # for dataframe crunching
library(tidyr) # for rinsing dataframes
library(lubridate) # for date-time handling
# coerce date to date-type
data <- data %>% mutate(date = ymd(date)) # ymd() from lubridate
# identify and assign next vacation date
data <- data %>%
mutate( days_till = if_else(event == "vacation", date, as.Date(NA))) %>%
fill( days_till, .direction = "up") %>% # fill() from tidyr
# determine delta and coerce to integer
mutate (days_till = (days_till - date) %>% as.integer() )
If you skip as.integer() at the end, it will give you the difference in units of days.
event date days_till
1 school 2020-01-01 3
2 school 2020-01-02 2
3 school 2020-01-03 1
4 vacation 2020-01-04 0
5 school 2020-01-05 4
6 school 2020-01-06 3
7 school 2020-01-07 2
8 school 2020-01-08 1
9 vacation 2020-01-09 0
Follow-up question: multiple students --> group_by()
# generate multi-student data
data <- data %>% mutate(date = ymd(date), student = "S1")
data2 <- data %>%
# replace a few dates with different vacation dates
mutate(event = if_else(day(date) %in% c(3,6,9), "vacation", "school")
, student = "S2")
data <- bind_rows(data, data2)
# group by student and run same routine
data <- data %>%
#------------------------- group-by ------------------------
group_by(student) %>%
#------------------------ operation over all groups ---------
mutate( days_till = if_else(event == "vacation", date, as.Date(NA))) %>%
fill( days_till, .direction = "up") %>%
mutate (days_till = (days_till - date) %>% as.integer() )
This will yield
event date days_till student
<chr> <date> <int> <chr>
1 school 2020-01-01 3 S1
2 school 2020-01-02 2 S1
3 school 2020-01-03 1 S1
4 vacation 2020-01-04 0 S1
5 school 2020-01-05 4 S1
6 school 2020-01-06 3 S1
7 school 2020-01-07 2 S1
8 school 2020-01-08 1 S1
9 vacation 2020-01-09 0 S1
10 school 2020-01-01 2 S2
11 school 2020-01-02 1 S2
12 vacation 2020-01-03 0 S2
13 school 2020-01-04 2 S2
14 school 2020-01-05 1 S2
15 vacation 2020-01-06 0 S2
16 school 2020-01-07 2 S2
17 school 2020-01-08 1 S2
18 vacation 2020-01-09 0 S2
Upvotes: 1
Reputation: 26218
This will also do
data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"),
date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"),
days_till = "")
#### Desired data frame######
desired <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"),
date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"),
days_till = c(3,2,1,0,4,3,2,1,0))
library(dplyr)
#>
#> Attaching package: 'dplyr'
data %>% group_by(d = data.table::rleid(event)) %>%
mutate(date = as.Date(date),
days_till = ifelse(event == 'vacation', last(date) - date, last(date) - date +1)) %>%
ungroup() %>% select(-d)
#> # A tibble: 9 x 3
#> event date days_till
#> <chr> <date> <dbl>
#> 1 school 2020-01-01 3
#> 2 school 2020-01-02 2
#> 3 school 2020-01-03 1
#> 4 vacation 2020-01-04 0
#> 5 school 2020-01-05 4
#> 6 school 2020-01-06 3
#> 7 school 2020-01-07 2
#> 8 school 2020-01-08 1
#> 9 vacation 2020-01-09 0
Created on 2021-05-10 by the reprex package (v2.0.0)
Upvotes: 1