Jazzmine
Jazzmine

Reputation: 1875

In R how to exclude certain rows when calculating quantile()

I'm interested in calculating the quantile function of a column in a data frame for only a subset of rows based upon another column.

For example, I have a new_user_indicator column with "Y" or "N", and want to know the quantile for "Y" group. Currently I am doing

quantile(subset_df$limit_amount, .25)
subset_df <- subset(carddata, new_user_indicator == "Y")

Is there a way to do this in one command rather than creating a subsetted data frame?

I looked at this to see if it could help but wasn't able to decipher part of the code.

Thanks

Upvotes: 0

Views: 815

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73285

Quantile function itself does not allow you operate on a subset. So you do need some way to extract subset data.

However, it is not recommended to extract a subset data frame, as you did. quantile accepts a vector, so you only need to subset a column rather than the whole data frame.

quantile(with(carddata, limit_amount[new_user_indicator == "Y"]), 0.25)

The with function helps extract column, otherwise you need

quantile(carddatal$imit_amount[carddata$new_user_indicator == "Y"], 0.25)

update

If you are to do this repeatedly, then write a function (change function name foo to your favourite)

foo <- function(df, out_var, in_var, in_level, prob) {
  quantile(df[[out_var]][df[[in_var]] == in_level], prob)
  }

Then you can do:

foo(carddata, "limit_amount", "new_user_indicator", "Y", 0.25)

I am assuming you have another level "N", so for that level you can do

foo(carddata, "limit_amount", "new_user_indicator", "N", 0.25)

Here, out_var, in_var are column names (hence a string) for output variable an input variable. in_level is the level for input variable. And you know what prob is for.


a more powerful way

If you want a 0.25 for all levels of input variable, then using my function is yet stupid. Use tapply

tapply(carddata$limit_amount, cardata$new_user_indicator, FUN = quantile, prob = 0.25)

tapply(x1, x2, FUN, ...) will apply quantile(x1, ...) to according to x2. If you have 10 levels in x2, then you get 0.25 quantile for all of them.

Upvotes: 1

Related Questions