Reputation: 327
I have a covid data frame with 376 columns, 7 rows with covid infection numbers of 376 different days in 7 countries. I've matched them different severity categories and now I'm trying to make a contingency table containing the severity categories as columns and countries as rows. I've written a function and it works but I'm still wondering if there is a more elegant solution maybe including such a thing as a table() function with an aggregation for each row.
My code:
severity <- function(x,countries){
sev = c("Leicht","Mittel","Schwer")
res=matrix(ncol=3,nrow=7)
colnames(res) = sev
rownames(res) = countries
for (i in 1:nrow(x)){
for (s in 1:length(sev)){
res[i,s]=length(x2[i,x2[i,]==sev[s]])
}
}
return(res)
}
r = severity(x2,covid_world2[,1]) #covid_world2 countains the countrynames, x2 the data with the categories
x = rbind(r,"Z" = colSums(r))
ctable=cbind(x,"S" = rowSums(x))
This is just an example of the first two rows in x2 (namely, representing the countries Canada,Germany)
dput(head(covid_world2[, 1:20]))
output is:
structure(list(Country = c("Canada", "France", "Germany", "Italy",
"Japan", "United Kingdom"), X1_22_20 = c(0, 0, 0, 0, 1.58132191886678e-08,
0), X1_23_20 = c(0, 0, 0, 0, 1.58132191886678e-08, 0), X1_24_20 = c(0,
3.06403006266968e-08, 0, 0, 1.58132191886678e-08, 0), X1_25_20 = c(0,
4.59604509400452e-08, 0, 0, 1.58132191886678e-08, 0), X1_26_20 = c(2.6495573093152e-08,
4.59604509400452e-08, 0, 0, 3.16264383773357e-08, 0), X1_27_20 = c(2.6495573093152e-08,
4.59604509400452e-08, 1.19354613321966e-08, 0, 3.16264383773357e-08,
0), X1_28_20 = c(5.2991146186304e-08, 6.12806012533936e-08, 4.77418453287863e-08,
0, 5.53462671603374e-08, 0), X1_29_20 = c(5.2991146186304e-08,
7.6600751566742e-08, 4.77418453287863e-08, 0, 5.53462671603374e-08,
0), X1_30_20 = c(5.2991146186304e-08, 7.6600751566742e-08, 4.77418453287863e-08,
0, 8.69727055376731e-08, 0), X1_31_20 = c(1.05982292372608e-07,
7.6600751566742e-08, 5.96773066609828e-08, 3.3078723093808e-08,
1.18599143915009e-07, 2.94611506927399e-08), X02_01_2020 = c(1.05982292372608e-07,
9.19209018800904e-08, 9.54836906575725e-08, 3.3078723093808e-08,
1.58132191886678e-07, 2.94611506927399e-08), X02_02_2020 = c(1.05982292372608e-07,
9.19209018800904e-08, 1.19354613321966e-07, 3.3078723093808e-08,
1.58132191886678e-07, 2.94611506927399e-08), X02_03_2020 = c(1.05982292372608e-07,
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08,
1.58132191886678e-07, 1.1784460277096e-07), X02_04_2020 = c(1.05982292372608e-07,
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08,
1.73945411075346e-07, 1.1784460277096e-07), X02_05_2020 = c(1.3247786546576e-07,
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08,
1.8185202066968e-07, 1.3257517811733e-07), X02_06_2020 = c(1.3247786546576e-07,
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08,
1.8185202066968e-07, 1.3257517811733e-07), X02_07_2020 = c(1.85469011652064e-07,
9.19209018800904e-08, 1.55160997318555e-07, 4.9618084640712e-08,
1.8185202066968e-07, 1.3257517811733e-07), X02_08_2020 = c(1.85469011652064e-07,
1.68521653446832e-07, 1.55160997318555e-07, 4.9618084640712e-08,
1.89758630264014e-07, 1.9149747950281e-07), X02_09_2020 = c(1.85469011652064e-07,
1.68521653446832e-07, 1.67096458650752e-07, 4.9618084640712e-08,
1.89758630264014e-07, 2.06228054849179e-07)), row.names = c(NA,
6L), class = "data.frame")
Upvotes: 2
Views: 180
Reputation: 18632
library(dplyr)
library(tidyr)
severity <- c("Leicht", "Mitte", "Schwer")
ranges <- c(0, 0.01, 0.05, Inf)
df %>%
dplyr::mutate(across(starts_with("X"), ~cut(., ranges, right = F, labels = severity))) %>%
tidyr::pivot_longer(cols = -Country,
names_to = "Date",
values_to = "Severity") %>%
dplyr::mutate(across(2, ~ as.Date(., tryFormats = "X%m_%d_%y"))) %>% # can delete line
tidyr::pivot_wider(id_cols = Country,
names_from = Severity,
values_from = Severity,
values_fn = length)
Country Leicht
<chr> <int>
1 Canada 19
2 France 19
3 Germany 19
4 Italy 19
5 Japan 19
6 United Kingdom 19
How it works
mutate
will apply the function cut
across all columns that start with "X". Cut uses the provided ranges (excluding the upper bound) and labels values within those range with the appropriate label from the vector severity
.cut(0.01, range, right = F)
[1] [0.01,0.05)
Levels: [0,0.01) [0.01,0.05) [0.05,Inf)
You can see that 0.01 is included in the range [0.01,0.05)
because the right bound is not inclusive due to right = F
. Adding label = severity
to this function will apply the proper label of "Mitte"
tidyr::pivot_longer
will pivot all columns except Country
to a long format. The column names will be stored in a new column called Date
. The values of the columns after mutate
will be stored in a new column called Severity
.mutate
changes the column Date
to a date vector. So "X2_01_20"
becomes a date 2020-02-01
. Note: you can delete this pipe, if you only care about the tabulation done in the next step. I included it in case you needed to look at things by date.tidyr::pivot_wider
will count the severity values for each countryUpvotes: 1