Reputation: 193
I have a dataframe assigning a code (2nd column) to geographical areas in the first column (can be a country or a continent or a larger area) and then 3 more columns indicating the code of the geographical area the geographical area in the first column belongs to, like this:
library(dplyr)
area <- c('Earth','Europe','Eurasia','Spain','Germany','North America','Latin America','Americas','US','Canada','Brazil','Argentina','Asia','Japan')
code <- c(0,1000,100,1100,1200,2000,2500,200,2100,2200,2600,2700,3000,3100)
level_1 <- as.numeric(c(NA,1000,NA,1000,1000,2000,2500,NA,2000,2000,2500,2500,3000,3000))
level_2 <- as.numeric(c(NA,100,100,100,100,200,200,200,200,200,200,200,100,100))
level_3 <- as.numeric(c(0,0,0,0,0,0,0,0,0,0,0,0,0,0))
data <- data.frame(area,code,level_1, level_2, level_3, stringsAsFactors = F)
What i want to do now is add the names of the geographical areas in column level_1 to level_3. This is my solution:
data2 <- data %>% select(area,code)
final <- data %>%
left_join(data2, by = c('level_1' = 'code')) %>%
left_join(data2, by = c('level_2' = 'code')) %>%
left_join(data2, by = c('level_3' = 'code'))
I get the following, which looks good:
country.x code level_1 level_2 level_3 country.y country.x.x country.y.y
1 Earth 0 NA NA 0 <NA> <NA> Earth
2 Europe 1000 1000 100 0 Europe Eurasia Earth
3 Eurasia 100 NA 100 0 <NA> Eurasia Earth
4 Spain 1100 1000 100 0 Europe Eurasia Earth
5 Germany 1200 1000 100 0 Europe Eurasia Earth
6 North America 2000 2000 200 0 North America Americas Earth
7 Latin America 2500 2500 200 0 Latin America Americas Earth
8 Americas 200 NA 200 0 <NA> Americas Earth
9 US 2100 2000 200 0 North America Americas Earth
10 Canada 2200 2000 200 0 North America Americas Earth
11 Brazil 2600 2500 200 0 Latin America Americas Earth
12 Argentina 2700 2500 200 0 Latin America Americas Earth
13 Asia 3000 3000 100 0 Asia Eurasia Earth
14 Japan 3100 3000 100 0 Asia Eurasia Earth
But i'm wondering if there is no smarter way to do this, especially as in my real task i have to do more than 3 left_joins, any thoughts?
I had a go below, but that does not work since the 'by' argument expects a simple string i guess...
df <- data
for (i in 1:3) {
df <- left_join(df, data2, by = c(paste0('level_',i) = 'code'), na_matches = 'never')
}
Upvotes: 3
Views: 93
Reputation: 887148
This could be done in an easier way by looping over the level
columns in mutate
with across
, get the index where it match
es with 'code' column, use that to subset the corresponding 'area',
library(dplyr)# 1.0.0
data1 <- data %>%
mutate(across(starts_with('level'), ~area[match(., code)],
.names = 'country{col}'))
data1
# area code level_1 level_2 level_3 countrylevel_1 countrylevel_2 countrylevel_3
#1 Earth 0 NA NA 0 <NA> <NA> Earth
#2 Europe 1000 1000 100 0 Europe Eurasia Earth
#3 Eurasia 100 NA 100 0 <NA> Eurasia Earth
#4 Spain 1100 1000 100 0 Europe Eurasia Earth
#5 Germany 1200 1000 100 0 Europe Eurasia Earth
#6 North America 2000 2000 200 0 North America Americas Earth
#7 Latin America 2500 2500 200 0 Latin America Americas Earth
#8 Americas 200 NA 200 0 <NA> Americas Earth
#9 US 2100 2000 200 0 North America Americas Earth
#10 Canada 2200 2000 200 0 North America Americas Earth
#11 Brazil 2600 2500 200 0 Latin America Americas Earth
#12 Argentina 2700 2500 200 0 Latin America Americas Earth
#13 Asia 3000 3000 100 0 Asia Eurasia Earth
#14 Japan 3100 3000 100 0 Asia Eurasia Earth
Or in base R
, we do the same logic by looping over the 'level' columns in lapply
, then get the index with match
and get the corresponding 'area'
nm1 <- grep('^level_\\d+$', names(data), value = TRUE)
data[paste0('country', 1:3)] <- lapply(data[nm1], function(x)
data$area[match(x, data$code)])
Or without looping by converting to a matrix
data[paste0('country', seq_along(nm1))] <- matrix(data$area[match(as.matrix(data[nm1]),
data$code)], ncol = length(nm1))
Upvotes: 2