Make contingency table from rows in R

Question

I have a covid data frame with 376 columns, 7 rows with covid infection numbers of 376 different days in 7 countries. I've matched them different severity categories and now I'm trying to make a contingency table containing the severity categories as columns and countries as rows. I've written a function and it works but I'm still wondering if there is a more elegant solution maybe including such a thing as a table() function with an aggregation for each row.

My code:

  severity <- function(x,countries){
    sev = c("Leicht","Mittel","Schwer")
    res=matrix(ncol=3,nrow=7)
    colnames(res) = sev
    rownames(res) = countries
    for (i in 1:nrow(x)){
      for (s in 1:length(sev)){
        res[i,s]=length(x2[i,x2[i,]==sev[s]])
      }
    }
    return(res)
}
r = severity(x2,covid_world2[,1]) #covid_world2 countains the countrynames, x2 the data with the categories
x = rbind(r,"Z" = colSums(r))
ctable=cbind(x,"S" = rowSums(x))

This is just an example of the first two rows in x2 (namely, representing the countries Canada,Germany)

dput(head(covid_world2[, 1:20]))

output is:

structure(list(Country = c("Canada", "France", "Germany", "Italy", 
"Japan", "United Kingdom"), X1_22_20 = c(0, 0, 0, 0, 1.58132191886678e-08, 
0), X1_23_20 = c(0, 0, 0, 0, 1.58132191886678e-08, 0), X1_24_20 = c(0, 
3.06403006266968e-08, 0, 0, 1.58132191886678e-08, 0), X1_25_20 = c(0, 
4.59604509400452e-08, 0, 0, 1.58132191886678e-08, 0), X1_26_20 = c(2.6495573093152e-08, 
4.59604509400452e-08, 0, 0, 3.16264383773357e-08, 0), X1_27_20 = c(2.6495573093152e-08, 
4.59604509400452e-08, 1.19354613321966e-08, 0, 3.16264383773357e-08, 
0), X1_28_20 = c(5.2991146186304e-08, 6.12806012533936e-08, 4.77418453287863e-08, 
0, 5.53462671603374e-08, 0), X1_29_20 = c(5.2991146186304e-08, 
7.6600751566742e-08, 4.77418453287863e-08, 0, 5.53462671603374e-08, 
0), X1_30_20 = c(5.2991146186304e-08, 7.6600751566742e-08, 4.77418453287863e-08, 
0, 8.69727055376731e-08, 0), X1_31_20 = c(1.05982292372608e-07, 
7.6600751566742e-08, 5.96773066609828e-08, 3.3078723093808e-08, 
1.18599143915009e-07, 2.94611506927399e-08), X02_01_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 9.54836906575725e-08, 3.3078723093808e-08, 
1.58132191886678e-07, 2.94611506927399e-08), X02_02_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.19354613321966e-07, 3.3078723093808e-08, 
1.58132191886678e-07, 2.94611506927399e-08), X02_03_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.58132191886678e-07, 1.1784460277096e-07), X02_04_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.73945411075346e-07, 1.1784460277096e-07), X02_05_2020 = c(1.3247786546576e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_06_2020 = c(1.3247786546576e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_07_2020 = c(1.85469011652064e-07, 
9.19209018800904e-08, 1.55160997318555e-07, 4.9618084640712e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_08_2020 = c(1.85469011652064e-07, 
1.68521653446832e-07, 1.55160997318555e-07, 4.9618084640712e-08, 
1.89758630264014e-07, 1.9149747950281e-07), X02_09_2020 = c(1.85469011652064e-07, 
1.68521653446832e-07, 1.67096458650752e-07, 4.9618084640712e-08, 
1.89758630264014e-07, 2.06228054849179e-07)), row.names = c(NA, 
6L), class = "data.frame")

LMc · Accepted Answer

library(dplyr)
library(tidyr)

severity <- c("Leicht", "Mitte", "Schwer")
ranges <- c(0, 0.01, 0.05, Inf)

df %>% 
  dplyr::mutate(across(starts_with("X"), ~cut(., ranges, right = F, labels = severity))) %>% 
  tidyr::pivot_longer(cols = -Country,
                      names_to = "Date",
                      values_to = "Severity") %>% 
  dplyr::mutate(across(2, ~ as.Date(., tryFormats = "X%m_%d_%y"))) %>% # can delete line
  tidyr::pivot_wider(id_cols = Country,
                     names_from = Severity,
                     values_from = Severity,
                     values_fn = length)

 Country        Leicht
             
1 Canada             19
2 France             19
3 Germany            19
4 Italy              19
5 Japan              19
6 United Kingdom     19

How it works

mutate will apply the function cut across all columns that start with "X". Cut uses the provided ranges (excluding the upper bound) and labels values within those range with the appropriate label from the vector severity.

cut(0.01, range, right = F)
[1] [0.01,0.05)
Levels: [0,0.01) [0.01,0.05) [0.05,Inf)

You can see that 0.01 is included in the range [0.01,0.05) because the right bound is not inclusive due to right = F. Adding label = severity to this function will apply the proper label of "Mitte"

tidyr::pivot_longer will pivot all columns except Country to a long format. The column names will be stored in a new column called Date. The values of the columns after mutate will be stored in a new column called Severity.
The final mutate changes the column Date to a date vector. So "X2_01_20" becomes a date 2020-02-01. Note: you can delete this pipe, if you only care about the tabulation done in the next step. I included it in case you needed to look at things by date.
tidyr::pivot_wider will count the severity values for each country

Make contingency table from rows in R

Answers (1)

Related Questions