DrPaulVella
DrPaulVella

Reputation: 449

Drop empty groups of categories in a summary table

I'm running a frequency table of several variables in my data like so:

frequencies1 <- data %>%
  gather(variable, value,
    Age, 
    Gender, 
    Comparison,
    Tenure
    ) %>%
  group_by(variable, value) %>%
  summarise (n = n()) %>%
  mutate(freq = ((n / sum(n, na.rm = TRUE)) * 100))

The output:

variable            value          n  freq
Age                 18-24          9  0.692
Age                 25-29          2  0.153
Age                 30-34          2  0.153 <- age % = 100% of 13
Comparison                         4  0.307
Comparison          Better         2  0.153
Comparison          Same           4  0.307
Comparison          Worse          3  0.230 <- comparison % = 100% of 13
WhereSeen_Facebook  1              2  0.153
WhereSeen_Instagram 1              5  0.384
WhereSeen_TV        1              9  0.692
Tenure                             1  0.076
Tenure              1 year or less 12 0.923

Each variable does not have the same amount of responses (i.e. Age n=13, Comparison n=9, etc...) which ends up showing as blanks under value. Is there a way to run the frequencies and filter out empty rows per variable? The proportions are based on the total sample when they should be only for the non-empty values.

Desired output:

variable            value          n freq
Age                 18-24          9 0.692
Age                 25-29          2 0.153
Age                 30-34          2 0.153 <- age % = 100% of n = 13
Comparison          Better         2 0.222
Comparison          Same           4 0.444
Comparison          Worse          3 0.333 <- comparison % = 100% of n = 9
WhereSeen_Facebook  1              2 
WhereSeen_Instagram 1              5
WhereSeen_TV        1              9
Tenure              1 year or less 12

Upvotes: 0

Views: 205

Answers (1)

mikeytop
mikeytop

Reputation: 170

I think you can add in a filter(value != "") (assuming value is a string and the empty value is just "") before you group_by.

So you'd have:

frequencies1 <- data %>%
  gather(variable, value,
    Age, 
    Gender, 
    Comparison,
    Tenure
    ) %>%
  filter(value != "") %<%
  group_by(variable, value) %>%
  summarise (n = n()) %>%
  mutate(freq = ((n / sum(n, na.rm = TRUE)) * 100))

Upvotes: 1

Related Questions