Reputation: 502

Trying to get frequnecy counts and percent by group of each column in data frame in R

I have data that look like this:

   pat# sex race    group   bmi
    1   F   Black   1   4
    2   M   Asian   2   8
    3   M   Asian   3   19
    4   M   Asian   1   35
    5   F   Black   2   12
    6   F   Black   3   33
    7   M   White   1   2
    8   F   Black   2   35
    9   M   Asian   3   6
    10  F   Black   1   13
    11  F   Black   2   18
    12  F   Asian   3   1
    13  M   White   1   36
    14  F   Asian   2   25
    15  M   White   3   6
    16  M   White   1   20
    17  F   Black   2   3
    18  M   Asian   3   23
    19  F   Black   1   26
    20  F   Asian   2   13
    21  M   White   3   21
    22  M   White   1   16
    23  F   Black   2   29
    24  F   Black   3   19
    25  M   Asian   1   17
    26  M   Asian   2   22
    27  F   Black   3   26

I would like to get the frequency of each variable and percent by group of each variable, like this:

        n           1   2   3
sex M   frequency   %   %   %
    F   frequency   %   %   %

next variable:

                n          1    2   3
race    White   frequency   %   %   %
        Asian   frequency   %   %   %
        Black   frequency   %   %   %

There are a lot of variables so I would rather not list each one. I've tried to use R's vector feature (df[2:30]) using xtabs() and dplyr package but am not getting it to work. Which package or function doesn't matter but would like to make it flexible enough for future data that uses different column names and have different dimensions. Any advice is greatly appreciated!!

Upvotes: 1

Answers (4)

wisamb

Reputation: 502

I was able to do this using the table() function and tigerstats package. The main problem I was having was R will treat a SAS dataset differently than a CSV dataset. Night and day!

Upvotes: 1

Onyambu

Reputation: 79208

If you need frequencies, you could do:

lapply(df[2:3], table, df$group)
$race
       
        1 2 3
  Asian 2 4 4
  Black 3 5 3
  White 4 0 2

$group
   
    1 2 3
  1 9 0 0
  2 0 9 0
  3 0 0 9

If you need percentages, you will have to define what percentage you need, ie rowwise, columnwise, total etc

If you need rowwise:

lapply(df[2:3], function(x)prop.table(table(x, df$group),1)*100)
$sex
   
x          1        2        3
  F 21.42857 50.00000 28.57143
  M 46.15385 15.38462 38.46154

$race
       
x              1        2        3
  Asian 20.00000 40.00000 40.00000
  Black 27.27273 45.45455 27.27273
  White 66.66667  0.00000 33.33333

Upvotes: 0

Ronak Shah

Reputation: 388972

A tidyverse way would be -

library(tidyverse)

df %>%
  count(sex, group) %>%
  group_by(sex) %>%
  mutate(n = prop.table(n) * 100) %>%
  pivot_wider(names_from = group, values_from = n, values_fill = 0)

#  sex     `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 F      21.4  50    28.6
#2 M      46.2  15.4  38.5

If you want to do this for multiple variables you can use map -

cols <- c('sex', 'race')

map(cols, ~df %>%
      count(.data[[.x]], group) %>%
      group_by(.data[[.x]]) %>%
      mutate(n = prop.table(n) * 100) %>%
      pivot_wider(names_from = group, values_from = n, values_fill = 0) %>%
      ungroup)

#[[1]]
# A tibble: 2 x 4
#  sex     `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 F      21.4  50    28.6
#2 M      46.2  15.4  38.5

#[[2]]
# A tibble: 3 x 4
#  race    `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 Asian  20    40    40  
#2 Black  27.3  45.5  27.3
#3 White  66.7   0    33.3

Upvotes: 0

Anoushiravan R

Reputation: 21908

One way is to use janitor package, but it will also change the totals into percents:

library(janitor)

df %>%
  tabyl(sex, group) %>%
  adorn_totals("col") %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 2)

 sex      1      2      3   Total
   F 21.43% 50.00% 28.57% 100.00%
   M 46.15% 15.38% 38.46% 100.00%

#But we could also choose counts
df %>%
  tabyl(sex, group) %>%
  adorn_totals("col")

 sex 1 2 3 Total
   F 3 7 4    14
   M 6 2 5    13

Upvotes: 2

Trying to get frequnecy counts and percent by group of each column in data frame in R

Answers (4)

Related Questions