How do I find the country which is added/removed a specific year in a data base?

Question

Maybe it is a really simple question but I am new in R and can't find the answer. Hopefully you will know :)

I am exploring the data base of Freedom in the World which contains data from 2013 till 2021.

While filtering, I found that some years there was 209 countries and others 210 countries. But I can't figure out which code/function I could use to find out which are being added/removed (maybe it's every time the same country, maybe it is not, I don't know).

The data.frame contains the variables of the year (Edition) and names of the countries (Country/Territory). Here you have a glimpse of the data frame:

    > glimpse(head (Freedom_df))
Rows: 6
Columns: 44
$ `Country/Territory`  "Abkhazia", "Afghanistan", "Albania", "Algeria", "Andorra", "Angola"
$ Region               "Eurasia", "Asia", "Europe", "MENA", "Europe", "SSA"
$ `C/T`                "t", "c", "c", "c", "c", "c"
$ Edition              2021, 2021, 2021, 2021, 2021, 2021
$ Status               "PF", "NF", "PF", "NF", "F", "NF"
$ `PR rating`          5, 5, 3, 6, 1, 6
$ `CL rating`          5, 6, 3, 5, 1, 5
$ A1                   2, 1, 3, 1, 4, 0
$ A2                   2, 1, 3, 1, 4, 2
$ A3                   1, 1, 2, 1, 4, 1
$ A                    5, 3, 8, 3, 12, 3
$ B1                   2, 2, 3, 1, 4, 1
$ B2                   3, 2, 3, 1, 4, 1
$ B3                   2, 1, 3, 1, 4, 1
$ B4                   1, 2, 3, 1, 3, 2
$ B                    8, 7, 12, 4, 15, 5
$ C1                   1, 1, 3, 1, 3, 1
$ C2                   1, 1, 2, 1, 4, 1
$ C3                   2, 1, 2, 1, 4, 0
$ C                    4, 3, 7, 3, 11, 2
$ `Add Q`              0, 0, 0, 0, 0, 0
$ `Add A`              "N/A", "N/A", "N/A", "N/A", "N/A", "N/A"
$ PR                   17, 13, 27, 10, 38, 10
$ D1                   2, 2, 2, 1, 3, 1
$ D2                   2, 1, 4, 1, 3, 2
$ D3                   1, 1, 3, 2, 4, 2
$ D4                   3, 2, 4, 2, 4, 2
$ D                    8, 6, 13, 6, 14, 7
$ E1                   3, 2, 3, 1, 4, 2
$ E2                   2, 1, 3, 1, 4, 2
$ E3                   1, 1, 2, 1, 3, 2
$ E                    6, 4, 8, 3, 11, 6
$ F1                   1, 1, 2, 1, 4, 1
$ F2                   1, 0, 2, 1, 4, 1
$ F3                   1, 0, 2, 2, 4, 1
$ F4                   1, 1, 3, 2, 3, 2
$ F                    4, 2, 9, 6, 15, 5
$ G1                   1, 0, 3, 2, 4, 1
$ G2                   1, 1, 2, 2, 4, 1
$ G3                   2, 0, 2, 2, 3, 1
$ G4                   1, 1, 2, 1, 4, 0
$ G                    5, 2, 9, 7, 15, 3
$ CL                   23, 14, 39, 22, 55, 21
$ Total                40, 27, 66, 32, 93, 31

Here you can see what I mentioned about having 209 or 210 countries depending on the year:

> count(Freedom_df, Edition) 
# A tibble: 9 x 2
  Edition     n
*    
1    2013   209
2    2014   209
3    2015   210
4    2016   210
5    2017   209
6    2018   209
7    2019   209
8    2020   210
9    2021   210

Here are two reproducible examples with the expected outputs.

Example 1: In this case I supose that there are 209 countries that always remain the same and there is just one that is added and removed.

 # A tibble: 9 x 4
  Edition     n    Added_country   Removed_country 
*                             
1    2013   209               NA                NA
2    2014   209               NA                NA
3    2015   210   "country_name"                NA
4    2016   210               NA                NA
5    2017   209               NA    "country_name"
6    2018   209               NA                NA
7    2019   209               NA                NA
8    2020   210   "country_name"                NA
9    2021   210               NA                NA

Example 2: In this case I supose there are 207 countries that remain the same all the years (2013:2021) and 3 other countries added/removed while maintaining the same counts.

 # A tibble: 9 x 4
  Edition     n    Different_country
 *                   
 1   2013   209          "country_A"
 2   2013   209          "country_B"
 3   2014   209          "country_A"
 4   2014   209          "country_C"
 5   2015   210          "country_A"
 6   2015   210          "country_B"
 7   2015   210          "country_C"
 8   2016   210          "country_A"
 9   2016   210          "country_B"
10   2016   210          "country_C"
11   2017   209          "country_B"
12   2017   209          "country_C"
13   2018   209          "country_B"
14   2018   209          "country_C"
15   2019   209          "country_B"
16   2019   209          "country_C"
17   2020   210          "country_A"
18   2020   210          "country_B"
19   2020   210          "country_C"
20   2021   210          "country_A"
21   2021   210          "country_B"
22   2021   210          "country_C"

I think that's enough information to solve it, let me know if you need any other details. Thanks :)

AnilGoyal · Accepted Answer

EDIT: since original data has been traced by @awaji98 in his/her answer, it can be seen that the following strategy works

freedom <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQwUAEPTrb4AmNNYSdqCupsrXJcDOODfxTEVSZyK-yIAA2ozTGJmWLOnJHa3v-czcBitkfCx2AU_iqj/pub?gid=0&single=true&output=csv")


freedom %>% group_by(Edition) %>%
  summarise(countries = list(Country)) %>%
  mutate(removed = map2(lag(countries), countries, setdiff),
         added = map2(countries, lag(countries, default = list(countries[[1]])), setdiff)) %>%
  group_by(Edition) %>%
  mutate(added = toString(unlist(added)),
         removed = toString(unlist(removed)),
         countries = length(unlist(countries))) %>%
  ungroup()

# A tibble: 9 x 4
  Edition countries removed       added           
                              
1    2013       209 ""            ""              
2    2014       209 ""            ""              
3    2015       210 ""            "Crimea"        
4    2016       210 ""            ""              
5    2017       209 "Puerto Rico" ""              
6    2018       209 ""            ""              
7    2019       209 ""            ""              
8    2020       210 ""            "Eastern Donbas"
9    2021       210 ""            ""

Old answer

Let's first construct a data say freedom, since you have not added any

library(tidyverse)

set.seed(2021)
freedom <- data.frame(year = rep(2010:2014, each = 5),
                      country = c(sample(LETTERS[1:8], 5), 
                                  sample(LETTERS[1:8], 5), 
                                  sample(LETTERS[1:8], 5), 
                                  sample(LETTERS[1:8], 5), 
                                  sample(LETTERS[1:8], 5)),
                      val = round(100 * runif(25)))

freedom %>% pivot_wider(id_cols = country, names_from = year, names_sort = T, values_from = val) %>%
  arrange(country)
#Let's have a look on this data
# A tibble: 8 x 6
  country `2010` `2011` `2012` `2013` `2014`
              
1 A           40     NA     76     52     63
2 B           82     NA     73     NA     20
3 C           NA     90     NA     NA     43
4 D           NA     52     NA     NA     54
5 E           NA     93     29     36     NA
6 F           21     NA     NA     23     NA
7 G           49     23     31      1     NA
8 H           68     62     70     88     17

Now the below syntax will give a list of year-wise added and removed countries. All countries will be treated as added during first year

freedom %>% group_by(year) %>%
  summarise(countries = list(country)) %>%
  mutate(removed = map2(lag(countries), countries, setdiff),
         added = map2(countries, lag(countries), setdiff)) %>%
  select(-countries) %>%
  unnest(c(added, removed))

# A tibble: 14 x 3
    year removed added
       
 1  2010 NA      B    
 2  2010 NA      G    
 3  2010 NA      A    
 4  2010 NA      F    
 5  2010 NA      H    
 6  2011 B       E    
 7  2011 A       D    
 8  2011 F       C    
 9  2012 D       B    
10  2012 C       A    
11  2013 B       F    
12  2014 G       C    
13  2014 F       D    
14  2014 E       B

Or if don't want to see additions during first year, do this

freedom %>% group_by(year) %>%
  summarise(countries = list(country)) %>%
  #group_by(year) %>%
  mutate(removed = map2(lag(countries), countries, setdiff),
         added = map2(countries, lag(countries), setdiff)) %>%
  select(-countries) %>%
  filter(as.numeric(row_number()) != 1) %>%
  unnest(c(added, removed))

# A tibble: 9 x 3
   year removed added
      
1  2011 B       E    
2  2011 A       D    
3  2011 F       C    
4  2012 D       B    
5  2012 C       A    
6  2013 B       F    
7  2014 G       C    
8  2014 F       D    
9  2014 E       B

Or if you just want to see how many added and removed , do this

freedom %>% group_by(year) %>%
  summarise(countries = list(country)) %>%
  #group_by(year) %>%
  mutate(removed = unlist(map2(lag(countries), countries, function(x, y) length(setdiff(x, y)))),
         added = unlist(map2(countries, lag(countries), function(x, y) length(setdiff(x, y))))) %>%
  select(-countries)

# A tibble: 5 x 3
   year removed added
      
1  2010       0     5
2  2011       3     3
3  2012       2     2
4  2013       1     1
5  2014       3     3

further update/edit As you have stated the countries count each year is not same, you may also adopt the following strategy

#let's edit the df to have unequal country count each year
freedom <- freedom[-c(9,21),]

#now
freedom %>% group_by(year) %>%
  summarise(countries = list(country)) %>%
  mutate(removed = map2(lag(countries), countries, setdiff),
         added = map2(countries, lag(countries), setdiff)) %>%
  group_by(year) %>%
  mutate(added = toString(unlist(added)),
         removed = toString(unlist(removed)),
         countries = length(unlist(countries))) %>%
  ungroup()

# A tibble: 5 x 4
   year countries removed   added        
                     
1  2010         5 ""        B, G, A, F, H
2  2011         4 "B, A, F" E, D         
3  2012         5 "D"       B, A         
4  2013         5 "B"       F            
5  2014         4 "G, F, E" D, B

I think now you can safely have an outlook on your data for additions/removals each year.

How do I find the country which is added/removed a specific year in a data base?

Answers (2)

Related Questions