infer quarter column from month and month column from quarter

Question

I have a list of data frames with same column names, however some df's have quarter information, and other have month information. Some have both or missing both. all data frames have year info. I am trying to build a condition and derive the missing info, to finally get new columns QtrYr and Date.

library(dplyr)
df <- dplyr::tibble(
  m = c(1, 2, NA, NA, NA, NA, 7, NA, 9, NA, NA, 12, NA),
  q = c(NA, NA, 1, 2, 2, 2, NA, 3, 3, 4, 4, 4, NA),
  y = c(2016, 2016, 2016, 2017, 2017, 2017, 2018 , 2018 , 2018 , 2020, 2020, 2020, 2020)
)
print(df)
#> # A tibble: 13 x 3
#>        m     q     y
#>      
#>  1     1    NA  2016
#>  2     2    NA  2016
#>  3    NA     1  2016
#>  4    NA     2  2017
#>  5    NA     2  2017
#>  6    NA     2  2017
#>  7     7    NA  2018
#>  8    NA     3  2018
#>  9     9     3  2018
#> 10    NA     4  2020
#> 11    NA     4  2020
#> 12    12     4  2020
#> 13    NA    NA  2020

lsdf <- list(df1 = df, df2 = df)

desired output.

out_df <- dplyr::tibble(
  m = c(1, 2, NA, NA, NA, NA, 7, NA, 9, NA, NA, 12, NA),
  q = c(NA, NA, 1, 2, 2, 2, NA, 3, 3, 4, 4, 4, NA),
  y = c(2016, 2016, 2016, 2017, 2019, 2020, 2017, 2019, 2020, 2016, 2017, 2019, 2020),
  qy = c("Q1/2016", "Q1/2016", "Q1/2016", "Q2/2017", "Q2/2017", "Q2/2017", "Q3/2018", "Q3/2018", "Q3/2018", "Q4/2020", "Q4/2020", "Q4/2020", NA),
  dy = c("3/1/2016", "3/1/2016", "3/1/2016", "6/1/2017", "6/1/2017", "6/1/2017", "9/1/2018", "9/1/2018", "9/1/2018", "12/1/2020", "12/1/2020", "12/1/2020", NA)
)

print(out_df)
#> # A tibble: 13 x 5
#>        m     q     y qy      dy       
#>              
#>  1     1    NA  2016 Q1/2016 3/1/2016 
#>  2     2    NA  2016 Q1/2016 3/1/2016 
#>  3    NA     1  2016 Q1/2016 3/1/2016 
#>  4    NA     2  2017 Q2/2017 6/1/2017 
#>  5    NA     2  2019 Q2/2017 6/1/2017 
#>  6    NA     2  2020 Q2/2017 6/1/2017 
#>  7     7    NA  2017 Q3/2018 9/1/2018 
#>  8    NA     3  2019 Q3/2018 9/1/2018 
#>  9     9     3  2020 Q3/2018 9/1/2018 
#> 10    NA     4  2016 Q4/2020 12/1/2020
#> 11    NA     4  2017 Q4/2020 12/1/2020
#> 12    12     4  2019 Q4/2020 12/1/2020
#> 13    NA    NA  2020

I tried to use case_when, thought it is fairly straightforward but looks like either I am not passing it as expected or totally in wrong direction.

lsdf$df1 %>% dplyr::mutate(
  Qrt = dplyr::case_when(
   is.na(m) & is.na(q) ~ NA,
   is.na(m) & !is.na(q) ~ q,
   m != NULL & q == NA ~ paste0("Q",ceiling(as.numeric(m)/3)),
   m != NULL & q != NULL ~ paste0("Q", q)
))
#> Error: `m != NULL & q == NA ~ paste0("Q", ceiling(as.numeric(m)/3))`, `m != NULL & q != NULL ~ paste0("Q", q)` must be length 13 or one, not 0

^{Created on 2020-03-31 by the reprex package (v0.3.0)}

Was thinking I can get a Qtryear column and then run this zoo function to get date.

 x <- c("Q1/13", "Q2/14")
as.Date(zoo::as.yearqtr(x, format = "Q%q/%y"))

Appreciate any help in solving this.

akrun · Accepted Answer

case_when and if_else does type check, so all the condition output needs to be of same type. Also, not clear why NULL should be checked on a vector ie. column as NULL would be automatically dropped and it can have an existence in a list env

i.e.

c(NA, NULL, 1:3)
[1] NA  1  2  3

and

list(NULL, NULL, 1:3) 
#[[1]]
#NULL

#[[2]]
#NULL

#[[3]]
#[1] 1 2 3

In the second case, NULL will remain as such

Here, if we are doing the checks, use is.null along with is.na, and make sure the output gets a single type, the q column is numeric (converted to character) while NA by default is logical (so use NA_character_ because the last condition output creates a character string with paste)

library(dplyr)
lsdf$df1 %>% dplyr::mutate(
   Qrt = dplyr::case_when(
    is.na(m) & is.na(q) ~ NA_character_,
    is.na(m) & !is.na(q) ~ as.character(q),
     !is.null(m) & !is.na(q) ~ paste0("Q",ceiling(as.numeric(m)/3)),
      !is.null(m) & !is.null(q) ~ paste0("Q", q)
 ))

Also, as it is a list, use map to loop over the list

library(purrr)
map(lsdf, ~ .x %>% dplyr::mutate(
   Qrt = dplyr::case_when(
    is.na(m) & is.na(q) ~ NA_character_,
    is.na(m) & !is.na(q) ~ as.character(q),
     !is.null(m) & !is.na(q) ~ paste0("Q",ceiling(as.numeric(m)/3)),
      !is.null(m) & !is.null(q) ~ paste0("Q", q)
 )))

Update

If we need the 'qy' column as in the updatedd

library(tidyr)
library(stringr)
library(zoo)
library(lubridate)
map(lsdf, ~ 
          .x %>%
              mutate(q1 = q) %>%
              fill(q, .direction = "downup") %>%
               mutate(qy = case_when(is.na(m) & is.na(q1) ~ NA_character_, 
                       TRUE ~ str_c("Q", q, "/", y))) %>%
               select(-q1)%>% 
               mutate(dy = floor_date(as.Date(as.yearqtr(qy, "Q%q/%Y"), frac = 1), "month"))))

infer quarter column from month and month column from quarter

Answers (2)

Update

Related Questions