H Hosseini
H Hosseini

Reputation: 43

creating a variable including the lag and lead of a date-time variable

I have a dataset that somehow is similar to the following example. I am trying to create a variable the same as the "expected" variable. That is if the date_time is "2020-01-02 12:00", create a variable including the lag, lead, and the level values of date_time.

Thank you in advance.

    df <- tibble(
             date_time = c("2020-01-01 10:00","2020-01-01 11:00","2020-01-01 12:00","2020-01-01 13:00","2020-01-01 14:00","2020-01-02 10:00","2020-01-02 11:00","2020-01-02 12:00","2020-01-03 10:00","2020-01-03 11:00"), 
             
             date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-02","2020-01-03","2020-01-03"),
               x= seq.default(1,5.5,.5),
date_time_NA = c(NA,"2020-01-01 11:00",NA,NA,NA,NA,NA,"2020-01-02 12:00",NA,NA),
            expected = c("2020-01-01 10:00","2020-01-01 11:00","2020-01-01 12:00",NA,NA,NA,"2020-01-02 11:00","2020-01-02 12:00","2020-01-03 10:00",NA) 
             )
df

NEW EXAMPLE

I changed the example above. So I am trying to tell R to find where date_time_NA is not missing and then create a variable including the lag, lead, and the level values of date_time (similar to expected).

UPDATE: another solution

I came up with a solution using the codes kindly shared by @akrun. It might not be a very smart solution; however, I found it intuitive. The main idea is to use ifelse to find the locations where the date_time_NA is not missing. Then we can use row_number() again and add 1 so it can find the lead and deduct 1, so it can find the lag location and replace them with the values of date_time.

df %>%
 mutate(na_row = ifelse(!is.na(date_time_NA),row_number(),NA),
       
         row_level_lag = ifelse(row_number() %in% c(na_row), date_time,NA), 
       
         row_level_now = ifelse(row_number() %in% c(na_row + 1), date_time,NA),
       
         row_level_lead = ifelse(row_number() %in% c(na_row - 1), date_time,NA),
       
         date_time_expected = glue("{row_level_lag} {row_level_now} {row_level_lead}"), 
       
         date_time_expected = stringr::str_replace_all(string = date_time_expected,pattern = "NA",replacement = "")) 

Upvotes: 1

Views: 235

Answers (1)

akrun
akrun

Reputation: 887901

We can use match to get the index of the concerned 'date_time' value, then get the index of previous, and next value by adding a sequence from -1:1. Use that to create a 'new' column with the subset value of 'date_time'

i1 <- match("2020-01-02 12:00", df$date_time)
i2 <- i1 + (-1:1)
df$new <- NA_character_
df$new[i2] <- df$date_time[i2]

-output

df
# A tibble: 10 x 5
#   date_time        date           x expected         new             
#   <chr>            <chr>      <dbl> <chr>            <chr>           
# 1 2020-01-01 10:00 2020-01-01   1   <NA>             <NA>            
# 2 2020-01-01 11:00 2020-01-01   1.5 <NA>             <NA>            
# 3 2020-01-01 12:00 2020-01-01   2   <NA>             <NA>            
# 4 2020-01-01 13:00 2020-01-01   2.5 <NA>             <NA>            
# 5 2020-01-01 14:00 2020-01-01   3   <NA>             <NA>            
# 6 2020-01-02 10:00 2020-01-02   3.5 <NA>             <NA>            
# 7 2020-01-02 11:00 2020-01-02   4   2020-01-02 11:00 2020-01-02 11:00
# 8 2020-01-02 12:00 2020-01-02   4.5 2020-01-02 12:00 2020-01-02 12:00
# 9 2020-01-03 10:00 2020-01-03   5   2020-01-03 10:00 2020-01-03 10:00
#10 2020-01-03 11:00 2020-01-03   5.5 <NA>             <NA>            

With dplyr, we can use replace to replace the values in 'date_time' that are not the matched index for the 'date_time' along with the previous and next index index to NA

library(dplyr)
df %>% 
   mutate(new = replace(date_time,
   !row_number()  %in% (match("2020-01-02 12:00", 
           df$date_time) + (-1:1)), NA_character_ ))

-output

# A tibble: 10 x 5
#   date_time        date           x expected         new             
#   <chr>            <chr>      <dbl> <chr>            <chr>           
# 1 2020-01-01 10:00 2020-01-01   1   <NA>             <NA>            
# 2 2020-01-01 11:00 2020-01-01   1.5 <NA>             <NA>            
# 3 2020-01-01 12:00 2020-01-01   2   <NA>             <NA>            
# 4 2020-01-01 13:00 2020-01-01   2.5 <NA>             <NA>            
# 5 2020-01-01 14:00 2020-01-01   3   <NA>             <NA>            
# 6 2020-01-02 10:00 2020-01-02   3.5 <NA>             <NA>            
# 7 2020-01-02 11:00 2020-01-02   4   2020-01-02 11:00 2020-01-02 11:00
# 8 2020-01-02 12:00 2020-01-02   4.5 2020-01-02 12:00 2020-01-02 12:00
# 9 2020-01-03 10:00 2020-01-03   5   2020-01-03 10:00 2020-01-03 10:00
#10 2020-01-03 11:00 2020-01-03   5.5 <NA>             <NA>            

Update

If there are more than one element to match, then we create a grouping column and apply the same code

library(tidyr)
df %>%
    mutate(grp = lead(cumsum(!is.na(date_time_NA)))) %>%
    fill(grp) %>%
    group_by(grp) %>% 
    mutate(new = replace(date_time,
      !row_number()  %in% (match(date_time_NA[complete.cases(date_time_NA)][1],
            date_time) + (-1:1)), NA_character_ )) %>%
     ungroup %>%
    select(-grp)
# A tibble: 10 x 6
#   date_time        date           x date_time_NA     expected         new             
#   <chr>            <chr>      <dbl> <chr>            <chr>            <chr>           
# 1 2020-01-01 10:00 2020-01-01   1   <NA>             2020-01-01 10:00 2020-01-01 10:00
# 2 2020-01-01 11:00 2020-01-01   1.5 2020-01-01 11:00 2020-01-01 11:00 2020-01-01 11:00
# 3 2020-01-01 12:00 2020-01-01   2   <NA>             2020-01-01 12:00 2020-01-01 12:00
# 4 2020-01-01 13:00 2020-01-01   2.5 <NA>             <NA>             <NA>            
# 5 2020-01-01 14:00 2020-01-01   3   <NA>             <NA>             <NA>            
# 6 2020-01-02 10:00 2020-01-02   3.5 <NA>             <NA>             <NA>            
# 7 2020-01-02 11:00 2020-01-02   4   <NA>             2020-01-02 11:00 2020-01-02 11:00
# 8 2020-01-02 12:00 2020-01-02   4.5 2020-01-02 12:00 2020-01-02 12:00 2020-01-02 12:00
# 9 2020-01-03 10:00 2020-01-03   5   <NA>             2020-01-03 10:00 2020-01-03 10:00
#10 2020-01-03 11:00 2020-01-03   5.5 <NA>             <NA>             <NA>            

Upvotes: 1

Related Questions