Romain
Romain

Reputation: 193

left_join a dataframe mutliple times on different keys

I have a dataframe assigning a code (2nd column) to geographical areas in the first column (can be a country or a continent or a larger area) and then 3 more columns indicating the code of the geographical area the geographical area in the first column belongs to, like this:

library(dplyr)

area <- c('Earth','Europe','Eurasia','Spain','Germany','North America','Latin America','Americas','US','Canada','Brazil','Argentina','Asia','Japan')
code <- c(0,1000,100,1100,1200,2000,2500,200,2100,2200,2600,2700,3000,3100)
level_1 <- as.numeric(c(NA,1000,NA,1000,1000,2000,2500,NA,2000,2000,2500,2500,3000,3000))
level_2 <- as.numeric(c(NA,100,100,100,100,200,200,200,200,200,200,200,100,100))
level_3 <- as.numeric(c(0,0,0,0,0,0,0,0,0,0,0,0,0,0))

data <- data.frame(area,code,level_1, level_2, level_3, stringsAsFactors = F)

What i want to do now is add the names of the geographical areas in column level_1 to level_3. This is my solution:

data2 <- data %>% select(area,code)

final <- data %>%
  left_join(data2, by = c('level_1' = 'code')) %>%
  left_join(data2, by = c('level_2' = 'code')) %>%
  left_join(data2, by = c('level_3' = 'code'))

I get the following, which looks good:

       country.x code level_1 level_2 level_3     country.y country.x.x country.y.y
1          Earth    0      NA      NA       0          <NA>        <NA>       Earth
2         Europe 1000    1000     100       0        Europe     Eurasia       Earth
3        Eurasia  100      NA     100       0          <NA>     Eurasia       Earth
4          Spain 1100    1000     100       0        Europe     Eurasia       Earth
5        Germany 1200    1000     100       0        Europe     Eurasia       Earth
6  North America 2000    2000     200       0 North America    Americas       Earth
7  Latin America 2500    2500     200       0 Latin America    Americas       Earth
8       Americas  200      NA     200       0          <NA>    Americas       Earth
9             US 2100    2000     200       0 North America    Americas       Earth
10        Canada 2200    2000     200       0 North America    Americas       Earth
11        Brazil 2600    2500     200       0 Latin America    Americas       Earth
12     Argentina 2700    2500     200       0 Latin America    Americas       Earth
13          Asia 3000    3000     100       0          Asia     Eurasia       Earth
14         Japan 3100    3000     100       0          Asia     Eurasia       Earth

But i'm wondering if there is no smarter way to do this, especially as in my real task i have to do more than 3 left_joins, any thoughts?

I had a go below, but that does not work since the 'by' argument expects a simple string i guess...

df <- data
for (i in 1:3) {
  df <- left_join(df, data2, by = c(paste0('level_',i) = 'code'), na_matches = 'never')
}

Upvotes: 3

Views: 93

Answers (1)

akrun
akrun

Reputation: 887148

This could be done in an easier way by looping over the level columns in mutate with across, get the index where it matches with 'code' column, use that to subset the corresponding 'area',

library(dplyr)# 1.0.0
data1 <- data %>%
         mutate(across(starts_with('level'),  ~area[match(., code)],
           .names = 'country{col}'))
data1
#            area code level_1 level_2 level_3 countrylevel_1 countrylevel_2 countrylevel_3
#1          Earth    0      NA      NA       0           <NA>           <NA>          Earth
#2         Europe 1000    1000     100       0         Europe        Eurasia          Earth
#3        Eurasia  100      NA     100       0           <NA>        Eurasia          Earth
#4          Spain 1100    1000     100       0         Europe        Eurasia          Earth
#5        Germany 1200    1000     100       0         Europe        Eurasia          Earth
#6  North America 2000    2000     200       0  North America       Americas          Earth
#7  Latin America 2500    2500     200       0  Latin America       Americas          Earth
#8       Americas  200      NA     200       0           <NA>       Americas          Earth
#9             US 2100    2000     200       0  North America       Americas          Earth
#10        Canada 2200    2000     200       0  North America       Americas          Earth
#11        Brazil 2600    2500     200       0  Latin America       Americas          Earth
#12     Argentina 2700    2500     200       0  Latin America       Americas          Earth
#13          Asia 3000    3000     100       0           Asia        Eurasia          Earth
#14         Japan 3100    3000     100       0           Asia        Eurasia          Earth

Or in base R, we do the same logic by looping over the 'level' columns in lapply, then get the index with match and get the corresponding 'area'

nm1 <- grep('^level_\\d+$', names(data), value = TRUE)
data[paste0('country', 1:3)] <- lapply(data[nm1], function(x) 
                 data$area[match(x, data$code)])

Or without looping by converting to a matrix

data[paste0('country', seq_along(nm1))] <- matrix(data$area[match(as.matrix(data[nm1]), 
          data$code)], ncol = length(nm1))

Upvotes: 2

Related Questions