RTD
RTD

Reputation: 65

Computing Quantiles for a column in R to subset

I have a data set with the following structure:

Name=c("a","b","c")
Amount_Spent=c(386407,213918,212006)

What I am trying to do is compute which quartile the Amount_Spent falls under for each name and assign the value to a new variable (column) Quantiles. I am not able to use any of the apply functions to get this result, can someone help please?

Thanks in advance, Raoul

Upvotes: 5

Views: 8684

Answers (2)

jlhoward
jlhoward

Reputation: 59355

The answer you get depends on how finely you want to cut the quantiles. Do you want quartiles (25% increments), deciles (10% increments), percentiles (1% increments)???

I have a feeling there's an easier way to do this, but here's one approach.

df           <- data.frame(Name,Amount_Spent)
q            <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))  # percentiles
# function to retrieve closest quantile for a given value.
get.quantile <- function(x)names(q)[which(abs(q-x)==min(abs(q-x)))]
# apply this function for all values in df$Amount_Spent
df$Quantile  <- sapply(df$Amount_Spent,get.quantile)
df
#   Name Amount_Spent Quantile
# 1    a       386407     100%
# 2    b       213918      50%
# 3    c       212006       0%

Here is a slightly more interesting example:

set.seed(1)
df <- data.frame(Name=letters,Amount_Spent=runif(26,2e5,4e5))
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
head(df)

#   Name Amount_Spent Quantile
# 1    a     253101.7      24%
# 2    b     274424.8      32%
# 3    c     314570.7      52%
# 4    d     381641.6      88%
# 5    e     240336.4      12%
# 6    f     379677.9      84%

Upvotes: 1

user20650
user20650

Reputation: 25854

You can do this using cut and quantile.

# some data
df <- data.frame(name=letters , am.spent = rnorm(26))

# divide df$am.spent 
df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent),
                                    labels=1:4, include.lowest=TRUE)

 # check ranges
 tapply(df$am.spent , df$qnt , range)

First get the quantile quantile(df$am.spent)

#        0%        25%        50%        75%       100% 
#-3.5888426 -0.6879445 -0.1461107  0.5835165  1.2030989 


Then use cut to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the breaksargument

Upvotes: 5

Related Questions