Kicker47
Kicker47

Reputation: 3

How to create count down without using a for loop in r

Team - thank you for your help. FYI - I spent the last few hours trying to find a similar request that doesn't use for_loops but I can't find one.

I have a large data set and using a for loop is time restrictive. I am attempting to build a count down column which provides me the number of days till the next event. The start of the next "vacation" event is 0. When working backwards it should increase. Below is are two datasets current and desired. I obviously filled in the desired$days_till by hand. Thank you for any tips / help.

#### Current data frame######
data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"), 
                   date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"), 
                   days_till = "")

#### Desired data frame######
desired <- data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"), 
                              date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"), 
                              days_till = c(3,2,1,0,4,3,2,1,0))

Upvotes: 0

Views: 191

Answers (2)

Ray
Ray

Reputation: 2268

The following is an answer based on tidyverse packages. In general, I recommend to work with the appropriate data-types. I.e. make your date variable a date-type. This way you can tap into a variety of useful functions (or get appropriate dimensioned outputs).

The approach you can use is as follows:

  • identify and assign the next vacation start to each row
  • determine difference between next vacation start and (row) date
library(dplyr)  # for dataframe crunching
library(tidyr)  # for rinsing dataframes
library(lubridate) # for date-time handling

# coerce date to date-type
data <- data %>% mutate(date = ymd(date))     # ymd() from lubridate

# identify and assign next vacation date
data <- data %>% 
  mutate( days_till = if_else(event == "vacation", date, as.Date(NA))) %>%   
  fill(   days_till, .direction = "up") %>%   # fill() from tidyr

# determine delta and coerce to integer
mutate (days_till = (days_till - date) %>% as.integer() )

If you skip as.integer() at the end, it will give you the difference in units of days.

   event       date days_till
1   school 2020-01-01         3
2   school 2020-01-02         2
3   school 2020-01-03         1
4 vacation 2020-01-04         0
5   school 2020-01-05         4
6   school 2020-01-06         3
7   school 2020-01-07         2
8   school 2020-01-08         1
9 vacation 2020-01-09         0

Follow-up question: multiple students --> group_by()

# generate multi-student data
data  <- data %>% mutate(date = ymd(date), student = "S1")
data2 <- data %>% 
   # replace a few dates with different vacation dates
   mutate(event = if_else(day(date) %in% c(3,6,9), "vacation", "school")
 , student = "S2")

data <- bind_rows(data, data2)

# group by student and run same routine
data <- data %>% 
#------------------------- group-by ------------------------
  group_by(student) %>%
#------------------------ operation over all groups ---------
    mutate( days_till = if_else(event == "vacation", date, as.Date(NA))) %>%   
    fill(   days_till, .direction = "up") %>%
    mutate (days_till = (days_till - date) %>% as.integer() )

This will yield

   event    date       days_till student
   <chr>    <date>         <int> <chr>  
 1 school   2020-01-01         3 S1     
 2 school   2020-01-02         2 S1     
 3 school   2020-01-03         1 S1     
 4 vacation 2020-01-04         0 S1     
 5 school   2020-01-05         4 S1     
 6 school   2020-01-06         3 S1     
 7 school   2020-01-07         2 S1     
 8 school   2020-01-08         1 S1     
 9 vacation 2020-01-09         0 S1     
10 school   2020-01-01         2 S2     
11 school   2020-01-02         1 S2     
12 vacation 2020-01-03         0 S2     
13 school   2020-01-04         2 S2     
14 school   2020-01-05         1 S2     
15 vacation 2020-01-06         0 S2     
16 school   2020-01-07         2 S2     
17 school   2020-01-08         1 S2     
18 vacation 2020-01-09         0 S2

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26218

This will also do

data <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"), 
                   date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"), 
                   days_till = "")

#### Desired data frame######
desired <- data.frame(event = c("school", "school", "school", "vacation", "school", "school", "school", "school", "vacation"), 
                              date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05", "2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09"), 
                              days_till = c(3,2,1,0,4,3,2,1,0))
library(dplyr)
#> 
#> Attaching package: 'dplyr'

data %>% group_by(d = data.table::rleid(event)) %>%
  mutate(date = as.Date(date),
         days_till = ifelse(event == 'vacation', last(date) - date, last(date) - date +1)) %>%
  ungroup() %>% select(-d)
#> # A tibble: 9 x 3
#>   event    date       days_till
#>   <chr>    <date>         <dbl>
#> 1 school   2020-01-01         3
#> 2 school   2020-01-02         2
#> 3 school   2020-01-03         1
#> 4 vacation 2020-01-04         0
#> 5 school   2020-01-05         4
#> 6 school   2020-01-06         3
#> 7 school   2020-01-07         2
#> 8 school   2020-01-08         1
#> 9 vacation 2020-01-09         0

Created on 2021-05-10 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions