Reputation: 9
I have a large df in R that contains a list of countries, years and scenarios. I need to check that the df contains all necessary combinations of those three variables according to a pre-specified list. See example:
Pre-defined list:
country_list = c("AFG", "BWA", "CIV")
year_list = c(2020, 2021, 2022)
scenario_list = c("Const", "Decr", "Incr")
Dataframe:
ISO3 = c("AFG", "AFG","AFG","AFG", "AFG","AFG","AFG", "AFG","AFG","BWA", "BWA","BWA","BWA", "BWA","BWA","BWA", "BWA","BWA","CIV","CIV","CIV","CIV","CIV","CIV","CIV","CIV","CIV")
Year = c(2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2023, 2020, 2021, 2023, 2020, 2021, 2023)
Scenario = c("Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr","Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr","Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr")
df = data.frame (ISO3, Year, Scenario)
The first rows of the resulting df look like this:
ISO3 | Year | Scenario |
---|---|---|
AFG | 2020 | Const |
AFG | 2021 | Const |
AFG | 2022 | Const |
AFG | 2020 | Decr |
AFG | 2021 | Decr |
AFG | 2022 | Decr |
I would like a piece of code that outputs the missing row combingations. In this case, CIV does not have Scenarios for the year 2022, only for 2020, 2021 and 2023.
Upvotes: 0
Views: 23
Reputation: 66415
We can use tidyr::complete
to augment the data frame with any missing combinations of ISO3/Year. Any that are added will be missing a Scenario
value, so we can filter to just see those.
library(tidyverse)
df |>
complete(ISO3, Year) |>
filter(is.na(Scenario))
# A tibble: 3 × 3
ISO3 Year Scenario
<chr> <dbl> <chr>
1 AFG 2023 NA
2 BWA 2023 NA
3 CIV 2022 NA
Upvotes: 0