Check if all the elements in the Vector are available in the groups in R data frame

Question

I am having a data frame in R as follows:

df <- data.frame("location" = c("IND","IND","IND","US","US","US"), type = c("butter","milk","cheese","milk","cheese","yogurt"), quantity = c(2,3,4,5,6,7))

I am having a vector as follows:

typeVector <- c("butter","milk","cheese","yogurt")

I need to check if all the 4 types mentioned in the vector are available in the data frame for each group based on the location. If any of the types are missing in a group, I need to add a row with the missing element and the corresponding location with the quantity as 0 in the data frame.

This is my expected output

dfOutput <- data.frame("location" = c("IND","IND","IND","IND","US","US","US","US"), type = c("butter","milk","cheese","yogurt","butter","milk","cheese","yogurt"), quantity = c(2,3,4,0,0,5,6,7))

How can I achieve this in R using dplyr package?

r2evans · Accepted Answer

library(dplyr)
distinct(df, location) %>%
  tidyr::crossing(type = typeVector) %>%
  full_join(df, ., by = c("location", "type")) %>%
  ungroup() %>%
  mutate(quantity = coalesce(quantity, 0))
#   location   type quantity
# 1      IND butter        2
# 2      IND   milk        3
# 3      IND cheese        4
# 4       US   milk        5
# 5       US cheese        6
# 6       US yogurt        7
# 7      IND yogurt        0
# 8       US butter        0

Steps:

Create a temporary frame that is an expansion of location with your types in typeVector;

distinct(df, location) %>%
  crossing(type = typeVector)
# # A tibble: 8 x 2
#   location type  
#        
# 1 IND      butter
# 2 IND      cheese
# 3 IND      milk  
# 4 IND      yogurt
# 5 US       butter
# 6 US       cheese
# 7 US       milk  
# 8 US       yogurt

Join this back onto the original data, which will produce NAs in the new rows

... %>%
  full_join(df, ., by = c("location", "type"))
#   location   type quantity
# 1      IND butter        2
# 2      IND   milk        3
# 3      IND cheese        4
# 4       US   milk        5
# 5       US cheese        6
# 6       US yogurt        7
# 7      IND yogurt       NA
# 8       US butter       NA

Change these new fields from NA to 0 with the mutate. (Note: if you have previously-existing NA and want to keep them that way, then this process needs to be adjusted.)
I tend to ungroup all grouped processes when done. This is not necessary for this task, but if you forget it's grouped and do some future work on it, it is possible that you will get different results, or at least it will be slightly less efficient.

Check if all the elements in the Vector are available in the groups in R data frame

Answers (1)

Related Questions