Reputation: 55
I have a large csv file, and I am trying to find the median and the mean values of certain values in a column. One of my columns is titled 'Race' and another is called 'debt_to_income_ratio'. Within the Race column, the four options are 'White', 'Black', 'Hispanic', and 'Other'. The 'debt_to_income_ratio' column has a number in it indicating the debt to income ratio of whatever the race is in the 'Race' column. I am trying to get a median and mean debt to income ratio for each race (white, black, hispanic, and other).
The code I am currently using is:
df['race average'] = df.groupby('Race')['debt_to_income_ratio'].transform('mean') %>%
df['race median'] = df.groupby('Race')['debt_to_income_ratio'].transform('median')
I'm not really sure what I should be doing, so thanks in advance for any help!
Upvotes: 0
Views: 160
Reputation: 13
An option based on the base R aggregate
function. Is this what you mean?
race_median = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = function(x) quantile(x, 0.5, na.rm = T))
race_mean = aggregate(debt_to_income_ratio ~ Race, data = df, FUN = "mean")
Upvotes: 0
Reputation: 886938
We can use dplyr
to do this
library(dplyr)
df %>%
group_by(Race) %>%
mutate(Mean = mean(debt_to_income_ratio, na.rm = TRUE),
Median = median(debt_to_income_ratio, na.rm = TRUE))
Upvotes: 1