longlivebrew
longlivebrew

Reputation: 301

dplyr - check if month is there, if not, add it in with an NA

close but not a duplicate: Proper idiom for adding zero count rows in tidyr/dplyr - I am trying to fill based on existing values in df, but also fill based on data that does not have an id. Similar, but fundamentally different.

For each id, I am trying to make sure each has 3 billing months.

Ideally, for each id I need all three required months to be present in df_complete. If it is not in the data, I am looking to add a row with "Not found" for the values.

Additionally, I want to check against all_ids and add ids that are in all_ids but do not have rows in df

months <- as.data.frame(as.Date(c("2016/7/1","2016/9/1","2016/7/1", "2016/8/1","2016/9/1", "2016/8/1","2016/9/1"))) 
id <- as.data.frame(c("a","a","b","b","b","c","c"))
value <- as.data.frame(c(1,2,3,4,5,6,7))
df <- cbind(id,months,value)
colnames(df) <- c("id","billing months","value")
required_months <- as.data.frame(as.Date(c("2016/7/1", "2016/8/1","2016/9/1")))
colnames(required_months)<- "required months"
all_ids <- as.data.frame(c("a","b", "c", "d"))

df ends up looking like:

id  billing months  value
a   7/1/2016    1
a   9/1/2016    2
b   7/1/2016    3
b   8/1/2016    4
b   9/1/2016    5
c   8/1/2016    6
c   9/1/2016    7

What I'm looking for (df_complete):

id  billing months  value
a   7/1/2016    1
a   8/1/2016    Not Found
a   9/1/2016    2
b   7/1/2016    3
b   8/1/2016    4
b   9/1/2016    5
c   7/1/2016    Not Found
c   8/1/2016    6
c   9/1/2016    7
d   7/1/2016    Not Found
d   8/1/2016    Not Found
d   9/1/2016    Not Found

Looking for a dplyr solution, but other packages work too.

Upvotes: 2

Views: 170

Answers (1)

aosmith
aosmith

Reputation: 36084

This looks like a job for tidyr::complete. As you are missing both id variables and months in your original dataset, you'll need to define the values you need filled in via complete. You define what you want to the missing values entered as with fill (although your Not found value will change your column from one that was potentially a column of numbers to a column of characters).

suppressPackageStartupMessages( library(dplyr) )
library(tidyr)

df %>%
    complete(id = c("a","b", "c", "d"), 
             `billing months` = required_months$`required months`,
             fill = list(value = "Not found") )

#> Warning: Column `id` joining character vector and factor, coercing into
#> character vector

#> # A tibble: 12 x 3
#>    id    `billing months` value    
#>    <chr> <date>           <chr>    
#>  1 a     2016-07-01       1        
#>  2 a     2016-08-01       Not found
#>  3 a     2016-09-01       2        
#>  4 b     2016-07-01       3        
#>  5 b     2016-08-01       4        
#>  6 b     2016-09-01       5        
#>  7 c     2016-07-01       Not found
#>  8 c     2016-08-01       6        
#>  9 c     2016-09-01       7        
#> 10 d     2016-07-01       Not found
#> 11 d     2016-08-01       Not found
#> 12 d     2016-09-01       Not found

Created on 2018-03-29 by the reprex package (v0.2.0).

Upvotes: 5

Related Questions