Frederick
Frederick

Reputation: 850

How to get the frequency from grouped data with dplyr?

This certainly is a basic question but I cannot figure it out by myself. Please consider the following:

In a large dataset with patient characteristics in long format I want to summarise some variables. I would prefer to use dplyr for that.

For the example data set:

db <- data.frame(ID = c(rep(1, 3), rep(2,4), rep(3, 2)),
                  Gender = factor(c(rep("woman", 7), rep("man", 2))),
                  Grade = c(rep(3, 3), rep(1, 4), rep(2, 2)))
db
#    ID Gender Grade
#  1  1 woman     3
#  2  1 woman     3
#  3  1 woman     3
#  4  2 woman     1
#  5  2 woman     1
#  6  2 woman     1
#  7  2 woman     1
#  8  3   man     2
#  9  3   man     2

I would like to make a frequency table for Gender and Grade. Obviously, there are 2 female patients and 1 male. Each grade (1:3) occurs once.

I tried:

x <- db %>% group_by(ID, Gender, Grade)
  table(y$Gender)
x
# A tibble: 9 x 3
# Groups:   ID, Gender, Grade [3]
#     ID Gender Grade
#  <dbl> <fct>  <dbl>
# 1    1. woman     3.
# 2    1. woman     3.
# 3    1. woman     3.
# 4    2. woman     1.
# 5    2. woman     1.
# 6    2. woman     1.
# 7    2. woman     1.
# 8    3. man       2.
# 9    3. man       2.

but when I call for instance table(x$Gender), the outcome is:

table(y$Gender)

#    man woman 
#      2     7 

What am I doing wrong?

Thanks a lot in advance!

Edit: The desired output is to have a frequency table of how many male/female participants there are in the dataset, as well as how many patients have grade 1, 2, 3 etc. Please see below.

With the following I can call the percentage of females in db:

db %>%
summarise(pct.female = mean(Gender == "woman", na.rm = T))
#    pct.female
# 1  0.7777778

What I would rather need is the amount of males/females (n). Something like this:

# man    woman
#   1        2

Upvotes: 0

Views: 603

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

require(dplyr)
require(magrittr)
db %>% count(ID, Gender) %$% table(Gender)

Or, without dplyr

require(magrittr)
db %$% split(Gender, ID) %>% sapply(unique) %>% table

Upvotes: 2

tjebo
tjebo

Reputation: 23797

require(dplyr)
db %>% group_by(Gender, Grade) %>% tally()

# A tibble: 3 x 3
# Groups:   Gender [?]
  Gender Grade     n
  <fct>  <dbl> <int>
1 man     2.00     2
2 woman   1.00     4
3 woman   3.00     3

# Was also suggested by @konvas in their comment.

will tell you all the unique combinations of Gender and Grade. And how many each of those exist. This what you want? Difficult to say from your question. Desired output would be good.


edit Alternatively, as per requested output:

db %>% distinct(ID, Gender) %>% count(Gender) 

# A tibble: 2 x 2
  Gender `n()`
  <fct>  <int>
1 man        1
2 woman      2

Upvotes: 2

Related Questions