ViSa
ViSa

Reputation: 2225

How to join two tables with help of 3rd mapping table in r?

I am new in r and this may sound like a silly question but I was trying to join two tables by country but got nan in some as their names do not match. (data from covid19.analytics & world bank data - WDI)

Now I have created a mapping table for country names that didn't match with country names from both table.

How can I join them to replace nan for the mis-matched country name rows ?

Data sources:

library(tidyverse)

url_country_stats <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/country_stats.csv"

url__ts <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/ts_all_long.csv"


country_stats <- read.csv(url(url_country_stats))
ts_all_long <- read.csv(url(url_ts))

joining tables:

ts_all_long <- ts_all_long %>% 
  left_join(y = country_stats %>% 
              select(-c("year","longitude","latitude","lending","capital")), 
            by = c("Country.Region" = "country")) %>% 
  relocate(c("Confirmed_daily","Recovered_daily","Death_daily","Case_Fatality_Ratio_prcnt","Confirmed","Recovered","Death"), 
           .after = income)

ts_all_long

Countries names that got mismatched/ gives na:

ts_all_long %>% 
  filter(is.na(iso2c),
         date == max(date)) %>% 
  pull(Country.Region)


[1] "Burma"                            "Congo (Kinshasa)"                
 [3] "Czechia"                          "Egypt"                           
 [5] "Iran"                             "Korea, South"                    
 [7] "Kyrgyzstan"                       "Russia"                          
 [9] "Saint Vincent and the Grenadines" "Slovakia"                        
[11] "Taiwan*"                          "US"                              
[13] "Venezuela"                        "Yemen" 

Mapping table for non matching country names from both table:

mapping_tbl <- data.frame(tbl_ts_all_long = c("Burma","Congo (Kinshasa)","Czechia","Egypt","Iran","Korea, South",
                     "Kyrgyzstan","Russia","Saint Vincent and the Grenadines","Slovakia","Taiwan*",
                     "US","Venezuela","Yemen"),

tbl_country_stats =  c("Myanmar","Congo, Dem. Rep.","Czech Republic","Egypt, Arab Rep.","Iran, Islamic Rep.","Korea, Rep.",
  "Kyrgyz Republic","Russian Federation","St. Vincent and the Grenadines","Slovak Republic","Taiwan Not Available","United States","Venezuela, RB","Yemen, Rep.")
) 

mapping_tbl

Now how can I join them to get values for above mentioned countries that have nan in their rows ?

Upvotes: 1

Views: 194

Answers (1)

akrun
akrun

Reputation: 886948

We could do a join on the 'mapping_tbl' first, coalesce the 'Country.Region' and the 'tbl_ts_all_long', then use the OP's code

out <- ts_all_long %>% 
      left_join(mapping_tbl, by = c("Country.Region" =  "tbl_ts_all_long")) %>% 
      mutate(Country.Region = coalesce(tbl_country_stats, Country.Region)) %>%
      left_join(y = country_stats %>% 
          select(-c("year","longitude","latitude","lending","capital")), 
        by = c("Country.Region" = "country")) %>% 
         relocate(c("Confirmed_daily","Recovered_daily","Death_daily",
               "Case_Fatality_Ratio_prcnt","Confirmed","Recovered","Death"), 
       .after = income) 

By checking the output

out %>% 
    filter(is.na(iso2c),
         date == max(date)) %>% 
   pull(Country.Region)
#[1] "Namibia"              "Taiwan Not Available"

the number of mismatches got reduced and the ones that are not matching should also be included in the 'mapping_tbl'

Upvotes: 1

Related Questions