How to force 0-count summary cells and add totals to columns and rows (R tidyverse)

Question

Okay, here is a sample of what my data looks like:

GRADE_LVL	COURSE_NAME	COURSE_CODE	STUDENT_GENDER	ETHNICITY	OUTCOME
12	Physics	03165	Male	White	Pass
12	Physics	03165	Female	White	Pass
12	Physics	03165	Nonbinary	Black or African American	Pass
9	Algebra I	02052	Female	Multiracial	Pass
10	Algebra I	02052	Female	White	Fail

I need to report on 3 genders (male, female, nonbinary), and 7 ethnicities (Hispanic or Latino, American Indian or Alaska Native, Asian, Native Hawaiian/Other Pacific Islander, Black or African American, White, and Multiracial).

I am trying to write a function in R that will produce a table of the demographic facts for a set of parameters passed to it. I want the output of the function to produce a tibble that looks like this :

High School Students Who Passed Algebra I

	Hispanic	Asian	Black	White	Multiracial	Total
Male	0	7	2	13	4	26
Female	1	3	1	12	3	20
Nonbinary	0	0	0	1	0	1
Total	1	10	3	26	7	47

Notes: This contains only dummy values that aren't related to those in my sample data snippet above. I have shortened the column names to save space on the screen.

Here is the code I have so far:

data <- dbGetQuery('secrets')
highSchool = c('09', '10', '11', '12')
passingOnly <- quo(OUTCOME == 'Pass')
algebra1 <- quo(COURSE_CODE == '02052')
ethnicCategories <- factor(c(
                              'Hispanic or Latino', 
                              'American Indian or Alaska Native',
                              'Asian',
                              'Native Hawaiian/Other Pacific Islander',
                              'Black or African American',
                              'White',
                              'Multiracial'
                            ))
genderCategories <- factor(c('Female', 'Male', 'Nonbinary'))

demographicBreakout <- function(filterConditions, gradeLevels) {
      data %>%
      filter( {{ filterConditions }} ) %>%
      filter(GRADE_LVL %in% gradeLevels) %>%
      select(STUDENT_GENDER, ETHNIC_DESC) %>%
      group_by(STUDENT_GENDER, ETHNIC_DESC) %>%
      summarise(COUNT = n()) %>%
      pivot_wider(
        names_from = ETHNIC_DESC, 
        values_from = COUNT, 
        values_fill = 0
      ) %>%
      rename_at("STUDENT_GENDER", ~"Gender")
}

report <- demographicBreakout(
          filterConditions = !!quo(!!algebra1 & !!passingOnly),
          gradeLevels = highSchool
      )

This code produces a tibble that looks like this:

Gender	Hispanic	Asian	Black	White	Multiracial
Female	2	15	2	26	9
Male	12	23	1	43	11

This is looking good so far, but I need to have all demographic categories present in the table even if the count is 0. I tried adding the following code snippet to my demographicBreakout function in between the summarize and pivot_wider statements:

      complete(
        ETHNIC_DESC = ethnicCategories,
        STUDENT_GENDER = genderCategories, 
        fill = list(COUNT = 0)
      ) %>%

Adding this code caused the following error:

Error in `reframe()`:
ℹ In argument: `complete(data = pick(everything()), ..., fill = fill, explicit = explicit)`.
ℹ In group 1: `STUDENT_GENDER = Female`.
Caused by error in `dplyr::full_join()`:
! Join columns in `y` must be present in the data.
✖ Problem with `STUDENT_GENDER`.

I have been unable to get this error resolved. In addition to this complete() statement, I also need to run something similar on the rows so that nonbinary counts appear. On top of all that, I still need to add in the row and column totals.

Any help getting over my current hurdle would be greatly appreciated.

Mark · Accepted Answer

The function can be simplified a lot, when we combine the factor conversion idea from Stefan's (now-deleted) answer, and the .drop = FALSE argument for the count() function:

ethnicCategories <- c('Hispanic or Latino', 
                    'American Indian or Alaska Native',
                    'Asian',
                    'Native Hawaiian/Other Pacific Islander',
                    'Black or African American',
                    'White',
                    'Multiracial')
genderCategories <- c('Female', 'Male', 'Nonbinary')

demographicBreakout <- function(filterConditions, gradeLevels) {
  data |>
    filter({{filterConditions}} & GRADE_LVL %in% gradeLevels) |>
    mutate(Gender = factor(STUDENT_GENDER, levels = genderCategories),
            e = factor(ETHNIC_DESC, levels = ethnicCategories)) |>
    count(Gender, e, .drop = FALSE) |>
    pivot_wider(names_from = e, values_from = n)
}

Example:

demographicBreakout(
          filterConditions = !!quo(COURSE_CODE == '02052'),
          gradeLevels = c('09', '10', '11', '12')
      ) |> print(width = Inf)

Output:

# A tibble: 3 × 8
  Gender    `Hispanic or Latino` `American Indian or Alaska Native` Asian
                                                     
1 Female                       0                                  0     0
2 Male                         0                                  0     0
3 Nonbinary                    0                                  0     0
  `Native Hawaiian/Other Pacific Islander` `Black or African American` White
                                                             
1                                        0                           0     1
2                                        0                           0     0
3                                        0                           0     0
  Multiracial
        
1           0
2           0
3           0

How to force 0-count summary cells and add totals to columns and rows (R tidyverse)

Answers (1)

Related Questions