Lauren
Lauren

Reputation: 55

How do I find median and mean of certain values in a column?

I have a large csv file, and I am trying to find the median and the mean values of certain values in a column. One of my columns is titled 'Race' and another is called 'debt_to_income_ratio'. Within the Race column, the four options are 'White', 'Black', 'Hispanic', and 'Other'. The 'debt_to_income_ratio' column has a number in it indicating the debt to income ratio of whatever the race is in the 'Race' column. I am trying to get a median and mean debt to income ratio for each race (white, black, hispanic, and other).

The code I am currently using is:

df['race average'] = df.groupby('Race')['debt_to_income_ratio'].transform('mean') %>%
df['race median'] = df.groupby('Race')['debt_to_income_ratio'].transform('median')

I'm not really sure what I should be doing, so thanks in advance for any help!

Upvotes: 0

Views: 160

Answers (2)

Jelmer
Jelmer

Reputation: 13

An option based on the base R aggregate function. Is this what you mean?

race_median = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = function(x) quantile(x, 0.5, na.rm = T))
race_mean   = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = "mean")

Upvotes: 0

akrun
akrun

Reputation: 886938

We can use dplyr to do this

library(dplyr)
df %>%
    group_by(Race) %>%
    mutate(Mean = mean(debt_to_income_ratio, na.rm = TRUE),
           Median = median(debt_to_income_ratio, na.rm = TRUE))
   

Upvotes: 1

Related Questions