Economist
Economist

Reputation: 183

Creating column that is a proportion of two conditions

I have a data frame with about 50 variables but where the ones in the example under are the most important. My aim is to create a table that includes various elements split by department and gender. The combination of dplyr, group_by and summarise gives me most of what I need but I haven't been able to figure out how to get separate columns that shows for example meanFemaleSalary/meanMaleSalary per department. I'm able to get the mean salary per gender per department in separate data frames, but either get an error or just a single value when I try to divide them.

I have tried searching the site and found what I believed was similar questions but couldn't get any of the answers to work. I'd be grateful if anyone could give me a hint on how to proceed… Thanks!

Example:

library(dplyr)
x <- data.frame(Department = rep(c("Dep1", "Dep2", "Dep3"), times=2),
                Gender = rep(c("F", "M"), times=3),
                Salary = seq(10,15))

This is what I have that actually works so far:

Table <- x %>% group_by(Department, Gender) %>% summarise(Count = n(),
                                                          AverageSalary = mean(Salary, na.rm = T),
                                                          MedianSalary = median(Salary, na.rm = T))

I'd like two additional columns for AvgSalaryWomen/Men and MedianSalaryWomen/Men. Again thanks!

Upvotes: 1

Views: 110

Answers (3)

ulfelder
ulfelder

Reputation: 5335

If you want to end up with a table that has one row per department and includes all of the descriptive statistics you're computing along the way, you probably need to convert to long, unite some columns to use as a key, go back to wide, and then add your ratios. Something like...

Table <- x %>%
  group_by(Department, Gender) %>%
  summarise(Count = n(),
            AverageSalary = mean(Salary, na.rm = TRUE),
            MedianSalary = median(Salary, na.rm = TRUE)) %>%
  # convert to long form
  gather(Quantity, Value, -Department, -Gender) %>%
  # create a unified gender/measure column to use as the key in the next step 
  unite(Set, Gender, Quantity) %>%
  # go back to wide, now with repeating columns by gender
  spread(Set, Value) %>%
  # compute the department-level quantities you want using those new cols
  mutate(AverageSalaryWomenMen = F_AverageSalary/M_AverageSalary,
         MedianSalaryWomenMen = F_MedianSalary/M_MedianSalary)

Upvotes: 2

Relasta
Relasta

Reputation: 1106

If you want the new columns to be part of Table you could do something like this. But it will result in the value being repeated per department.

Table %>% group_by(Department) %>% 
    mutate(`AvgSalaryWomen/Men` = AverageSalary[Gender == "F"]/AverageSalary[Gender == "M"],
           `MedianSalaryWomen/Men` = MedianSalary[Gender == "F"]/MedianSalary[Gender == "M"])

# Department Gender Count AverageSalary MedianSalary `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`
# <fct>      <fct>  <int>         <dbl>        <int>                <dbl>                   <dbl>
# 1 Dep1       F          1           10.           10                0.769                   0.769
# 2 Dep1       M          1           13.           13                0.769                   0.769
# 3 Dep2       F          1           14.           14                1.27                    1.27 
# 4 Dep2       M          1           11.           11                1.27                    1.27 
# 5 Dep3       F          1           12.           12                0.800                   0.800
# 6 Dep3       M          1           15.           15                0.800                   0.800

If you want just one row per department simply change mutate to summarise and you'll get

# Department `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`
#   <fct>                     <dbl>                   <dbl>
# 1 Dep1                      0.769                   0.769
# 2 Dep2                      1.27                    1.27 
# 3 Dep3                      0.800                   0.800

Upvotes: 2

akrun
akrun

Reputation: 887651

Here is an option to get this by spreading it to wide format

library(tidyverse)
x %>%
  spread(Gender, Salary) %>%
  group_by(Department) %>%
  summarise(`AvgSalaryWomen/Men` = mean(F)/mean(M), 
           `MedianSalaryWomen/Men` = median(F)/median(M))

# A tibble: 3 x 3
#    Department `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`     
#    <fctr>                    <dbl>                   <dbl>
# 1 Dep1                      0.769                   0.769
# 2 Dep2                      1.27                    1.27 
# 3 Dep3                      0.800                   0.800           `

Upvotes: 2

Related Questions