OnlyDean
OnlyDean

Reputation: 1039

How to force 0-count summary cells and add totals to columns and rows (R tidyverse)

Okay, here is a sample of what my data looks like:

GRADE_LVL COURSE_NAME COURSE_CODE STUDENT_GENDER ETHNICITY OUTCOME
12 Physics 03165 Male White Pass
12 Physics 03165 Female White Pass
12 Physics 03165 Nonbinary Black or African American Pass
9 Algebra I 02052 Female Multiracial Pass
10 Algebra I 02052 Female White Fail

I need to report on 3 genders (male, female, nonbinary), and 7 ethnicities (Hispanic or Latino, American Indian or Alaska Native, Asian, Native Hawaiian/Other Pacific Islander, Black or African American, White, and Multiracial).

I am trying to write a function in R that will produce a table of the demographic facts for a set of parameters passed to it. I want the output of the function to produce a tibble that looks like this :

High School Students Who Passed Algebra I

Hispanic Native American Asian Black White Multiracial Total
Male 0 0 7 2 13 4 26
Female 1 0 3 1 12 3 20
Nonbinary 0 0 0 0 1 0 1
Total 1 0 10 3 26 7 47

Notes: This contains only dummy values that aren't related to those in my sample data snippet above. I have shortened the column names to save space on the screen.

Here is the code I have so far:

data <- dbGetQuery('secrets')
highSchool = c('09', '10', '11', '12')
passingOnly <- quo(OUTCOME == 'Pass')
algebra1 <- quo(COURSE_CODE == '02052')
ethnicCategories <- factor(c(
                              'Hispanic or Latino', 
                              'American Indian or Alaska Native',
                              'Asian',
                              'Native Hawaiian/Other Pacific Islander',
                              'Black or African American',
                              'White',
                              'Multiracial'
                            ))
genderCategories <- factor(c('Female', 'Male', 'Nonbinary'))

demographicBreakout <- function(filterConditions, gradeLevels) {
      data %>%
      filter( {{ filterConditions }} ) %>%
      filter(GRADE_LVL %in% gradeLevels) %>%
      select(STUDENT_GENDER, ETHNIC_DESC) %>%
      group_by(STUDENT_GENDER, ETHNIC_DESC) %>%
      summarise(COUNT = n()) %>%
      pivot_wider(
        names_from = ETHNIC_DESC, 
        values_from = COUNT, 
        values_fill = 0
      ) %>%
      rename_at("STUDENT_GENDER", ~"Gender")
}

report <- demographicBreakout(
          filterConditions = !!quo(!!algebra1 & !!passingOnly),
          gradeLevels = highSchool
      )

This code produces a tibble that looks like this:

Gender Hispanic Asian Black White Multiracial
Female 2 15 2 26 9
Male 12 23 1 43 11

This is looking good so far, but I need to have all demographic categories present in the table even if the count is 0. I tried adding the following code snippet to my demographicBreakout function in between the summarize and pivot_wider statements:

      complete(
        ETHNIC_DESC = ethnicCategories,
        STUDENT_GENDER = genderCategories, 
        fill = list(COUNT = 0)
      ) %>%

Adding this code caused the following error:

Error in `reframe()`:
ℹ In argument: `complete(data = pick(everything()), ..., fill = fill, explicit = explicit)`.
ℹ In group 1: `STUDENT_GENDER = Female`.
Caused by error in `dplyr::full_join()`:
! Join columns in `y` must be present in the data.
✖ Problem with `STUDENT_GENDER`.

I have been unable to get this error resolved. In addition to this complete() statement, I also need to run something similar on the rows so that nonbinary counts appear. On top of all that, I still need to add in the row and column totals.

Any help getting over my current hurdle would be greatly appreciated.

Upvotes: 0

Views: 97

Answers (1)

Mark
Mark

Reputation: 12518

The function can be simplified a lot, when we combine the factor conversion idea from Stefan's (now-deleted) answer, and the .drop = FALSE argument for the count() function:

ethnicCategories <- c('Hispanic or Latino', 
                    'American Indian or Alaska Native',
                    'Asian',
                    'Native Hawaiian/Other Pacific Islander',
                    'Black or African American',
                    'White',
                    'Multiracial')
genderCategories <- c('Female', 'Male', 'Nonbinary')

demographicBreakout <- function(filterConditions, gradeLevels) {
  data |>
    filter({{filterConditions}} & GRADE_LVL %in% gradeLevels) |>
    mutate(Gender = factor(STUDENT_GENDER, levels = genderCategories),
            e = factor(ETHNIC_DESC, levels = ethnicCategories)) |>
    count(Gender, e, .drop = FALSE) |>
    pivot_wider(names_from = e, values_from = n)
}

Example:

demographicBreakout(
          filterConditions = !!quo(COURSE_CODE == '02052'),
          gradeLevels = c('09', '10', '11', '12')
      ) |> print(width = Inf)

Output:

# A tibble: 3 × 8
  Gender    `Hispanic or Latino` `American Indian or Alaska Native` Asian
  <fct>                    <int>                              <int> <int>
1 Female                       0                                  0     0
2 Male                         0                                  0     0
3 Nonbinary                    0                                  0     0
  `Native Hawaiian/Other Pacific Islander` `Black or African American` White
                                     <int>                       <int> <int>
1                                        0                           0     1
2                                        0                           0     0
3                                        0                           0     0
  Multiracial
        <int>
1           0
2           0
3           0

Upvotes: 1

Related Questions