Homer Jay Simpson
Homer Jay Simpson

Reputation: 1214

How can i add missing month value and remove duplicate in dplyr in R?

I have a dataset that looks like this :

vaR date
A 1/1/2022
A 1/2/2022
A 1/3/2022
B 1/1/2022
B 1/3/2022
C 1/1/2022
C 1/1/2022
C 1/2/2022
C 1/2/2022
C 1/3/2022

And i want to be arranged by month and by the var variable. But if a month is not recorded (missing) i want to be added (to be appeared ) in the new column named Month and to mutate (dplyr phrase) another column that will check if there was an entry on that month (logical condition).But there are some entries for example C that has more that one entries which counts for one (distinct).

Ideally is must look like this :

var Quarter Month Condition
A 1 1 TRUE
A 1 2 TRUE
A 1 3 TRUE
B 1 1 TRUE
B 1 2 FALSE
B 1 3 TRUE
C 1 1 TRUE
C 1 2 TRUE
C 1 3 TRUE

As a start i have tried this one in R :

var = c(rep("A",3),rep("B",2),rep("C",5));var
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data%>%
  dplyr::mutate(month = lubridate::month(date),
                Quarter = quarter)

But i don't know how to add the missing month and check for the verified condition. Any help ?

Upvotes: 0

Views: 154

Answers (2)

langtang
langtang

Reputation: 24822

You can approach it this way:

library(lubridate)
library(dplyr)
libraty(tidyr)

df <- df %>% mutate(month=month(date),quarter=quarter(month))
left_join(
  expand(df, var,month,quarter),
  select(df,var, month) %>% mutate(condition=TRUE) %>%  distinct()
) %>% mutate(condition=!is.na(condition))

Output

  var   month quarter condition
  <chr> <dbl>   <int> <lgl>    
1 A         1       1 TRUE     
2 A         2       1 TRUE     
3 A         3       1 TRUE     
4 B         1       1 TRUE     
5 B         2       1 FALSE    
6 B         3       1 TRUE     
7 C         1       1 TRUE     
8 C         2       1 TRUE     
9 C         3       1 TRUE  

Upvotes: 1

lhs
lhs

Reputation: 1038

You can use complete() to fill in the missing months and then check whether they have an associated date, then use distinct() to find the unique combinations.

library(dplyr)
library(tidyr)

var = c(rep("A",3),rep("B",2),rep("C",5))
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data %>% 
  mutate(month = lubridate::month(date)) %>% 
  complete(var, month) %>% 
  mutate(Quarter = quarter,
         Condition = !is.na(date)) %>% 
  distinct(var, month, Quarter, Condition)
#> # A tibble: 9 × 4
#>   var   month Quarter Condition
#>   <chr> <dbl>   <dbl> <lgl>    
#> 1 A         1       1 TRUE     
#> 2 A         2       1 TRUE     
#> 3 A         3       1 TRUE     
#> 4 B         1       1 TRUE     
#> 5 B         2       1 FALSE    
#> 6 B         3       1 TRUE     
#> 7 C         1       1 TRUE     
#> 8 C         2       1 TRUE     
#> 9 C         3       1 TRUE

Created on 2022-06-01 by the reprex package (v2.0.1)

Upvotes: 3

Related Questions