Reputation: 149
The WHO DownSyndrome database gives you information about the number of down syndrome babies born in 1M babies for different variables.
I wanted to create a cross tab with the absolute, relative, and marginal frequencies for the categorical variables Country
and Year
The desire results would look something like:
Year | 1978 1979 1980
Country | ----------------------------------------------------------------------
ALB | 76.4 78.5 87.6 ...
AUT | 87.6 67.9 80.0 …
AZE | 90.5 78.5 64.8 …
I tried a couple of things:
First, creating a new df for the three variables:
DS <- dplyr::select(DownSyndrome, YEAR, COUNTRY_REGION, VALUE)
But I can´t figure out how to use the table, and cross functions with that.
I managed to get the absolute frequencies with:
DS_wide <- DownSyndrome %>%
pivot_wider(
names_from = YEAR,
values_from = VALUE)
But then again, I dont know how to get the relative and margin frequencies from here.
Any help is appreciated, Thanks
PS:
Upvotes: 1
Views: 630
Reputation: 24790
Here's a dplyr
approach.
The first line converts the long form to wide form based on your code. The second line calculates the totals based on all of the columns with names that match the pattern of only numbers. The third line calculates the relative total for each country. The fourth line divides all of the columns with only numbers by the total column and assigns them to columns with a suffix of _RELATIVE
.
library(tidyverse)
DownSyndrome %>%
pivot_wider(names_from = YEAR,values_from = VALUE) %>%
mutate(TOTAL = rowSums(.[grep("[0-9]+",names(.))],na.rm=TRUE)) %>%
mutate(TOTAL_RELATIVE = TOTAL / sum(TOTAL)) %>%
mutate_at(vars(matches("[0-9]+")), list(RELATIVE = ~ ./TOTAL))
# A tibble: 49 x 104
# COUNTRY COUNTRY_GRP SEX `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `1985` `1986` `1987` `1988`
# <fct> <fct> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 ALB "" ALL 32 29 42 39 18 28 14 29 23 16 NA NA NA NA
# 2 AUT "" ALL 13 16 12 6 6 3 NA NA NA NA 23 29 39 24
# 3 AZE "" ALL 38 51 52 42 32 38 49 36 32 38 NA NA NA NA
# 4 BEL "" ALL 248 232 275 240 259 245 NA NA NA NA 56 126 152 126
# 5 BGR "" ALL 22 32 27 32 38 17 32 32 18 24 NA NA NA NA
# 6 BIH "" ALL NA NA NA NA NA NA NA NA NA NA 59 42 75 65
# 7 BLR "" ALL 69 60 44 61 46 50 56 53 63 56 NA NA NA NA
# 8 CHE "" ALL 121 107 127 145 152 152 131 108 112 86 NA NA NA NA
# 9 CZE "" ALL 43 49 46 53 49 45 45 43 NA NA NA NA NA 116
#10 DEU "" ALL 269 288 251 290 298 284 290 296 308 312 NA NA NA NA
If you only want the relative values, you could use select
.
DownSyndrome %>%
pivot_wider(names_from = YEAR,values_from = VALUE) %>%
mutate(TOTAL = rowSums(.[grep("[0-9]+",names(.))],na.rm=TRUE)) %>%
mutate(TOTAL_RELATIVE = TOTAL / sum(TOTAL)) %>%
mutate_at(vars(matches("[0-9]+")), list(RELATIVE = ~ ./TOTAL)) %>%
select(COUNTRY,matches("RELATIVE"),TOTAL_RELATIVE)
DownSyndrome <- read.csv("~/HFA_604_EN.csv",skip = 25)
Upvotes: 1