Reputation: 143
I've searched high and low but can't seem to get my head round how to get a summary count of a number of items, if the variable is a factor and not an integer using group_by
. I am sure I am missing a simple trick here.
It's quite common to have multiple time periods associated with the same patient and in order to keep the data tidy, some variables, e.g. gender, do not change but are replicated across each time period.
Example:
df <- tibble(patient_id = rep(1:4, each = 3),
time_period = as_factor(rep(c("0 weeks", "6 weeks", "12 weeks"), times = 4)),
gender = as_factor(rep(c("female", "male"), each = 3, times = 2)))
This gives the following tibble:
# A tibble: 12 × 3
patient_id time_period gender
<int> <fct> <fct>
1 1 0 weeks female
2 1 6 weeks female
3 1 12 weeks female
4 2 0 weeks male
5 2 6 weeks male
6 2 12 weeks male
7 3 0 weeks female
8 3 6 weeks female
9 3 12 weeks female
10 4 0 weeks male
11 4 6 weeks male
12 4 12 weeks male
Trying the following:
df %>%
select(!time_period) %>%
group_by(patient_id) %>%
count(gender)
Just gives:
# A tibble: 4 × 3
# Groups: patient_id [4]
patient_id gender n
<int> <fct> <int>
1 1 female 3
2 2 male 3
3 3 female 3
4 4 male 3
Whereas I am looking for the summary number of female and male patients once the repeated time periods have been collapsed into a single level i.e. 2 female and 2 male overall
Upvotes: 3
Views: 1187
Reputation: 24722
df %>% distinct(patient_id, gender) %>% count(gender)
# A tibble: 2 x 2
gender n
<fct> <int>
1 female 2
2 male 2
Upvotes: 5