Reputation: 502
I have data that look like this:
pat# sex race group bmi
1 F Black 1 4
2 M Asian 2 8
3 M Asian 3 19
4 M Asian 1 35
5 F Black 2 12
6 F Black 3 33
7 M White 1 2
8 F Black 2 35
9 M Asian 3 6
10 F Black 1 13
11 F Black 2 18
12 F Asian 3 1
13 M White 1 36
14 F Asian 2 25
15 M White 3 6
16 M White 1 20
17 F Black 2 3
18 M Asian 3 23
19 F Black 1 26
20 F Asian 2 13
21 M White 3 21
22 M White 1 16
23 F Black 2 29
24 F Black 3 19
25 M Asian 1 17
26 M Asian 2 22
27 F Black 3 26
I would like to get the frequency of each variable and percent by group of each variable, like this:
n 1 2 3
sex M frequency % % %
F frequency % % %
next variable:
n 1 2 3
race White frequency % % %
Asian frequency % % %
Black frequency % % %
There are a lot of variables so I would rather not list each one. I've tried to use R's vector feature (df[2:30]
) using xtabs()
and dplyr
package but am not getting it to work. Which package or function doesn't matter but would like to make it flexible enough for future data that uses different column names and have different dimensions. Any advice is greatly appreciated!!
Upvotes: 1
Views: 1332
Reputation: 502
I was able to do this using the table()
function and tigerstats
package. The main problem I was having was R will treat a SAS dataset differently than a CSV dataset. Night and day!
Upvotes: 1
Reputation: 79208
If you need frequencies, you could do:
lapply(df[2:3], table, df$group)
$race
1 2 3
Asian 2 4 4
Black 3 5 3
White 4 0 2
$group
1 2 3
1 9 0 0
2 0 9 0
3 0 0 9
If you need percentages, you will have to define what percentage you need, ie rowwise, columnwise, total etc
If you need rowwise:
lapply(df[2:3], function(x)prop.table(table(x, df$group),1)*100)
$sex
x 1 2 3
F 21.42857 50.00000 28.57143
M 46.15385 15.38462 38.46154
$race
x 1 2 3
Asian 20.00000 40.00000 40.00000
Black 27.27273 45.45455 27.27273
White 66.66667 0.00000 33.33333
Upvotes: 0
Reputation: 388972
A tidyverse
way would be -
library(tidyverse)
df %>%
count(sex, group) %>%
group_by(sex) %>%
mutate(n = prop.table(n) * 100) %>%
pivot_wider(names_from = group, values_from = n, values_fill = 0)
# sex `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 F 21.4 50 28.6
#2 M 46.2 15.4 38.5
If you want to do this for multiple variables you can use map
-
cols <- c('sex', 'race')
map(cols, ~df %>%
count(.data[[.x]], group) %>%
group_by(.data[[.x]]) %>%
mutate(n = prop.table(n) * 100) %>%
pivot_wider(names_from = group, values_from = n, values_fill = 0) %>%
ungroup)
#[[1]]
# A tibble: 2 x 4
# sex `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 F 21.4 50 28.6
#2 M 46.2 15.4 38.5
#[[2]]
# A tibble: 3 x 4
# race `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 Asian 20 40 40
#2 Black 27.3 45.5 27.3
#3 White 66.7 0 33.3
Upvotes: 0
Reputation: 21908
One way is to use janitor
package, but it will also change the totals into percents:
library(janitor)
df %>%
tabyl(sex, group) %>%
adorn_totals("col") %>%
adorn_percentages() %>%
adorn_pct_formatting(digits = 2)
sex 1 2 3 Total
F 21.43% 50.00% 28.57% 100.00%
M 46.15% 15.38% 38.46% 100.00%
#But we could also choose counts
df %>%
tabyl(sex, group) %>%
adorn_totals("col")
sex 1 2 3 Total
F 3 7 4 14
M 6 2 5 13
Upvotes: 2