mikimouse
mikimouse

Reputation: 9

R check dataframe contains the necessary combinations across 3 columns

I have a large df in R that contains a list of countries, years and scenarios. I need to check that the df contains all necessary combinations of those three variables according to a pre-specified list. See example:

Pre-defined list:

country_list = c("AFG", "BWA", "CIV")
year_list = c(2020, 2021, 2022)
scenario_list = c("Const", "Decr", "Incr")

Dataframe:

ISO3 = c("AFG", "AFG","AFG","AFG", "AFG","AFG","AFG", "AFG","AFG","BWA", "BWA","BWA","BWA", "BWA","BWA","BWA", "BWA","BWA","CIV","CIV","CIV","CIV","CIV","CIV","CIV","CIV","CIV")

Year = c(2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2023, 2020, 2021, 2023, 2020, 2021, 2023)

Scenario = c("Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr","Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr","Const", "Const","Const","Decr", "Decr","Decr","Incr","Incr","Incr")

df = data.frame (ISO3, Year, Scenario)

The first rows of the resulting df look like this:

ISO3 Year Scenario
AFG 2020 Const
AFG 2021 Const
AFG 2022 Const
AFG 2020 Decr
AFG 2021 Decr
AFG 2022 Decr

I would like a piece of code that outputs the missing row combingations. In this case, CIV does not have Scenarios for the year 2022, only for 2020, 2021 and 2023.

Upvotes: 0

Views: 23

Answers (1)

Jon Spring
Jon Spring

Reputation: 66415

We can use tidyr::complete to augment the data frame with any missing combinations of ISO3/Year. Any that are added will be missing a Scenario value, so we can filter to just see those.

library(tidyverse)
df |>
  complete(ISO3, Year) |>
  filter(is.na(Scenario))

# A tibble: 3 × 3
  ISO3   Year Scenario
  <chr> <dbl> <chr>   
1 AFG    2023 NA      
2 BWA    2023 NA      
3 CIV    2022 NA    

Upvotes: 0

Related Questions