R fill a matrix by row using matrix row and column names

Question

I have a dataset which looks like that:

set.seed(2)
origin <- rep(c("DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR","DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR"), 2)
year <- rep(c(1998,1998,1998,1998,1998,1998,1998,1998,1998,1998,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000), 2)
value <- sample(1:10000, size=length(origin), replace=TRUE)
test.df <- as.data.frame(cbind(origin, year, value))
rm(origin, year, value)

Then I have 2 lists.

The first one is a list of countries by regions build with the ISOcodes library as follow:

library("ISOcodes")
list.continent <- list(asia = c("Central Asia", "Eastern Asia", "South-eastern Asia", "Southern Asia", "Western Asia"),
             africa = c("Northern Africa", "Sub-Saharan Africa", "Eastern Africa", "Middle Africa", "Southern Africa", "Western Africa"),
             europe = c("Eastern Europe", "Northern Europe", "Channel Islands", "Southern Europe", "Western Europe"),
             oceania = c("Australia and New Zealand", "Melanesia", "Micronesia", "Polynesia"),
             northamerica = c("Northern America"),
             latinamerica = c("South America", "Central America", "Caribbean"))

country.list.continent <- sapply(list.continent, function(item) {    
    region <- subset(UN_M.49_Regions, Name %in% item)
    sub <- subset(UN_M.49_Countries, Code %in% unlist(strsplit(region$Children, ", ")))
    return(sub$ISO_Alpha_3)
}, simplify = FALSE)
rm(list.continent)

And an other list with the years:

year.list <- levels(as.factor(unique(test.df$year)))

I want to fill a matrix with computed numbers corresponding to a precise area for a specific year. The matrix is as follow:

ncol <- length(year.list)
nrow <- length(country.list.continent)

matrix.extraction <- matrix(, nrow = nrow, ncol = ncol)
rownames(matrix.extraction) <- names(country.list.continent)
colnames(matrix.extraction) <- year.list

To do my computation I have a loop to be able to subset the dataset too big otherwise... the loop is based on the years (equivalent to colnames(matrix.extraction)). The idea is to compute what represent (in %) every country value during every year. The computation part is simple enough and works well. My problem arise when I need to attribute the value to each row.

for(i in 1:length(colnames(matrix.extraction))){
    ### I subset and compute what I want
    table.temp <- test.df %>%
                subset(year == colnames(matrix.extraction)[i]) %>%
                group_by(origin) %>%
                summarise(value = sum(value, na.rm = TRUE))
    table.temp$percent <-  prop.table(table.temp$value)
    ### then I need to attribute the wanted values
    matrix.extraction["ROWNAME",i]  <- table.temp %>% 
                                subset(origin %in% country.list.continent$"ROWNAME") %>% 
                                summarise(. ,sum = sum(percent)))
}

I really have no idea of how I can do such thing.

The expected result is a matrix like:

             1998 2000
asia         here   NA
africa         NA   NA
europe         NA   NA
oceania        NA   NA
northamerica   NA   NA
latinamerica   NA   NA

with, instead of "here" in [1,1], the sum of the values of every country of the area in rowname for the year in colname.

Any help would be appreciated.

Ronak Shah · Accepted Answer

Using double sapply we can loop over all combinations of year.list and country.list.continent and calculate sum of value for each combination.

sapply(year.list, function(x) sapply(names(country.list.continent), function(y) {
     with(test.df, sum(value[origin %in% country.list.continent[[y]] & year == x]))
 }))

#              1998  2000
#asia         21759 20059
#africa           0     0
#europe       39700 35981
#oceania          0     0
#northamerica 21347 17324
#latinamerica 10847  8672

If we are interested in a tidyverse solution

library(tidyverse)

crossing(x = year.list, y = names(country.list.continent)) %>%
     mutate(sum = map2_dbl(x, y, ~ 
               test.df %>% 
                 filter(year == .x & origin %in% country.list.continent[[.y]]) %>%
                 summarise(total = sum(value)) %>%
                 pull(total)))

#    x     y              sum
#            
# 1 1998  africa           0
# 2 1998  asia         21759
# 3 1998  europe       39700
# 4 1998  latinamerica 10847
# 5 1998  northamerica 21347
# 6 1998  oceania          0
# 7 2000  africa           0
# 8 2000  asia         20059
# 9 2000  europe       35981
#10 2000  latinamerica  8672
#11 2000  northamerica 17324
#12 2000  oceania          0

You have numbers stored as factors in test.df, we need to change them to actual numbers. Run the below before applying the above methods.

test.df[-1] <- lapply(test.df[-1], function(x) as.numeric(as.character(x)))

R fill a matrix by row using matrix row and column names

Answers (2)

data

Related Questions