Reputation: 2506
The code below does what I want for a simple table. The mapping that takes place in the statement with on
works perfectly. But I also have the situation with multiple countries that need to be assigned potentially to multiple regions and the result stored in the regions column is more challenging
library(data.table)
testDT <- data.table(country = c("Algeria", "Egypt", "United States", "Brazil"))
testDTcomplicated <- data.table(country = c("Algeria, Ghana, Sri Lanka", "Egypt", "United States, Argentina", "Brazil"))
regionLookup <- data.table(countrylookup = c("Algeria", "Argentina", "Egypt", "United States", "Brazil", "Ghana", "Sri Lanka"), regionVal = c("Africa", "South America", "Africa", "North America", "South America", "Africa", "Asia"))
testDT[regionLookup, region := regionVal, on = c(country = "countrylookup")]
> testDT
country region
1: Algeria Africa
2: Egypt Africa
3: United States North America
4: Brazil South America
I'd like to have testDTcomplicated look like the following
> testDT
country region
1: Algeria, Ghana, Sri Lanka Africa, Africa, Asia
2: Egypt Africa
3: United States, Argentina, Brazil North America, South America, South America
4: Brazil South America
Upvotes: 1
Views: 66
Reputation: 388817
You could split the data on comma and get each country in a separate row, join the data with regionLookup
and collapse them again in one value in a comma-separated string.
library(data.table)
testDTcomplicated[, row := seq_len(.N)]
new <- splitstackshape::cSplit(testDTcomplicated, 'country', ',',
direction = 'long')[regionLookup, region := regionVal,
on = c(country = "countrylookup")]
new <- new[, lapply(.SD, toString), row][,row:=NULL]
new
# country region
#1: Algeria, Ghana, Sri Lanka Africa, Africa, Asia
#2: Egypt Africa
#3: United States, Argentina North America, South America
#4: Brazil South America
Same logic in dplyr
can be implemented as :
library(dplyr)
testDTcomplicated %>%
mutate(row = row_number()) %>%
tidyr::separate_rows(country, sep = ", ") %>%
left_join(regionLookup, by = c("country" = "countrylookup")) %>%
group_by(row) %>%
summarise(across(.fns = toString))
Upvotes: 3