Reputation: 850
This certainly is a basic question but I cannot figure it out by myself. Please consider the following:
In a large dataset with patient characteristics in long format I want to summarise some variables. I would prefer to use dplyr
for that.
For the example data set:
db <- data.frame(ID = c(rep(1, 3), rep(2,4), rep(3, 2)),
Gender = factor(c(rep("woman", 7), rep("man", 2))),
Grade = c(rep(3, 3), rep(1, 4), rep(2, 2)))
db
# ID Gender Grade
# 1 1 woman 3
# 2 1 woman 3
# 3 1 woman 3
# 4 2 woman 1
# 5 2 woman 1
# 6 2 woman 1
# 7 2 woman 1
# 8 3 man 2
# 9 3 man 2
I would like to make a frequency table for Gender
and Grade
. Obviously, there are 2 female patients and 1 male. Each grade (1:3) occurs once.
I tried:
x <- db %>% group_by(ID, Gender, Grade)
table(y$Gender)
x
# A tibble: 9 x 3
# Groups: ID, Gender, Grade [3]
# ID Gender Grade
# <dbl> <fct> <dbl>
# 1 1. woman 3.
# 2 1. woman 3.
# 3 1. woman 3.
# 4 2. woman 1.
# 5 2. woman 1.
# 6 2. woman 1.
# 7 2. woman 1.
# 8 3. man 2.
# 9 3. man 2.
but when I call for instance table(x$Gender)
, the outcome is:
table(y$Gender)
# man woman
# 2 7
What am I doing wrong?
Thanks a lot in advance!
Edit: The desired output is to have a frequency table of how many male/female participants there are in the dataset, as well as how many patients have grade 1, 2, 3 etc. Please see below.
With the following I can call the percentage of females in db:
db %>%
summarise(pct.female = mean(Gender == "woman", na.rm = T))
# pct.female
# 1 0.7777778
What I would rather need is the amount of males/females (n). Something like this:
# man woman
# 1 2
Upvotes: 0
Views: 603
Reputation: 28705
require(dplyr)
require(magrittr)
db %>% count(ID, Gender) %$% table(Gender)
Or, without dplyr
require(magrittr)
db %$% split(Gender, ID) %>% sapply(unique) %>% table
Upvotes: 2
Reputation: 23797
require(dplyr)
db %>% group_by(Gender, Grade) %>% tally()
# A tibble: 3 x 3
# Groups: Gender [?]
Gender Grade n
<fct> <dbl> <int>
1 man 2.00 2
2 woman 1.00 4
3 woman 3.00 3
# Was also suggested by @konvas in their comment.
will tell you all the unique combinations of Gender and Grade. And how many each of those exist. This what you want? Difficult to say from your question. Desired output would be good.
edit Alternatively, as per requested output:
db %>% distinct(ID, Gender) %>% count(Gender)
# A tibble: 2 x 2
Gender `n()`
<fct> <int>
1 man 1
2 woman 2
Upvotes: 2