Reputation: 865
I am having a data frame in R as follows:
df <- data.frame("location" = c("IND","IND","IND","US","US","US"), type = c("butter","milk","cheese","milk","cheese","yogurt"), quantity = c(2,3,4,5,6,7))
I am having a vector as follows:
typeVector <- c("butter","milk","cheese","yogurt")
I need to check if all the 4 types mentioned in the vector are available in the data frame for each group based on the location. If any of the types are missing in a group, I need to add a row with the missing element and the corresponding location with the quantity as 0 in the data frame.
This is my expected output
dfOutput <- data.frame("location" = c("IND","IND","IND","IND","US","US","US","US"), type = c("butter","milk","cheese","yogurt","butter","milk","cheese","yogurt"), quantity = c(2,3,4,0,0,5,6,7))
How can I achieve this in R using dplyr
package?
Upvotes: 0
Views: 163
Reputation: 161110
library(dplyr)
distinct(df, location) %>%
tidyr::crossing(type = typeVector) %>%
full_join(df, ., by = c("location", "type")) %>%
ungroup() %>%
mutate(quantity = coalesce(quantity, 0))
# location type quantity
# 1 IND butter 2
# 2 IND milk 3
# 3 IND cheese 4
# 4 US milk 5
# 5 US cheese 6
# 6 US yogurt 7
# 7 IND yogurt 0
# 8 US butter 0
Steps:
Create a temporary frame that is an expansion of location
with your types in typeVector
;
distinct(df, location) %>%
crossing(type = typeVector)
# # A tibble: 8 x 2
# location type
# <chr> <chr>
# 1 IND butter
# 2 IND cheese
# 3 IND milk
# 4 IND yogurt
# 5 US butter
# 6 US cheese
# 7 US milk
# 8 US yogurt
Join this back onto the original data, which will produce NA
s in the new rows
... %>%
full_join(df, ., by = c("location", "type"))
# location type quantity
# 1 IND butter 2
# 2 IND milk 3
# 3 IND cheese 4
# 4 US milk 5
# 5 US cheese 6
# 6 US yogurt 7
# 7 IND yogurt NA
# 8 US butter NA
Change these new fields from NA
to 0 with the mutate
. (Note: if you have previously-existing NA
and want to keep them that way, then this process needs to be adjusted.)
I tend to ungroup
all grouped processes when done. This is not necessary for this task, but if you forget it's grouped and do some future work on it, it is possible that you will get different results, or at least it will be slightly less efficient.
Upvotes: 1