Ollie
Ollie

Reputation: 143

Summarising counts of factors

I've searched high and low but can't seem to get my head round how to get a summary count of a number of items, if the variable is a factor and not an integer using group_by. I am sure I am missing a simple trick here.

It's quite common to have multiple time periods associated with the same patient and in order to keep the data tidy, some variables, e.g. gender, do not change but are replicated across each time period.

Example:

df <- tibble(patient_id = rep(1:4, each = 3),
             time_period = as_factor(rep(c("0 weeks", "6 weeks", "12 weeks"), times = 4)),
             gender = as_factor(rep(c("female", "male"), each = 3, times = 2)))

This gives the following tibble:

   # A tibble: 12 × 3
   patient_id time_period gender
        <int> <fct>       <fct> 
 1          1 0 weeks     female
 2          1 6 weeks     female
 3          1 12 weeks    female
 4          2 0 weeks     male  
 5          2 6 weeks     male  
 6          2 12 weeks    male  
 7          3 0 weeks     female
 8          3 6 weeks     female
 9          3 12 weeks    female
10          4 0 weeks     male  
11          4 6 weeks     male  
12          4 12 weeks    male  

Trying the following:

df %>% 
  select(!time_period) %>%
  group_by(patient_id) %>% 
  count(gender)

Just gives:

# A tibble: 4 × 3
# Groups:   patient_id [4]
  patient_id gender     n
       <int> <fct>  <int>
1          1 female     3
2          2 male       3
3          3 female     3
4          4 male       3

Whereas I am looking for the summary number of female and male patients once the repeated time periods have been collapsed into a single level i.e. 2 female and 2 male overall

Upvotes: 3

Views: 1187

Answers (1)

langtang
langtang

Reputation: 24722

df %>% distinct(patient_id, gender) %>% count(gender)

# A tibble: 2 x 2
  gender     n
  <fct>  <int>
1 female     2
2 male       2

Upvotes: 5

Related Questions