Reputation: 65
I have a data set with the following structure:
Name=c("a","b","c")
Amount_Spent=c(386407,213918,212006)
What I am trying to do is compute which quartile the Amount_Spent
falls under for each name and assign the value to a new variable (column) Quantiles
. I am not able to use any of the apply functions to get this result, can someone help please?
Thanks in advance, Raoul
Upvotes: 5
Views: 8684
Reputation: 59355
The answer you get depends on how finely you want to cut the quantiles. Do you want quartiles (25% increments), deciles (10% increments), percentiles (1% increments)???
I have a feeling there's an easier way to do this, but here's one approach.
df <- data.frame(Name,Amount_Spent)
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01)) # percentiles
# function to retrieve closest quantile for a given value.
get.quantile <- function(x)names(q)[which(abs(q-x)==min(abs(q-x)))]
# apply this function for all values in df$Amount_Spent
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
df
# Name Amount_Spent Quantile
# 1 a 386407 100%
# 2 b 213918 50%
# 3 c 212006 0%
Here is a slightly more interesting example:
set.seed(1)
df <- data.frame(Name=letters,Amount_Spent=runif(26,2e5,4e5))
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
head(df)
# Name Amount_Spent Quantile
# 1 a 253101.7 24%
# 2 b 274424.8 32%
# 3 c 314570.7 52%
# 4 d 381641.6 88%
# 5 e 240336.4 12%
# 6 f 379677.9 84%
Upvotes: 1
Reputation: 25854
You can do this using cut
and quantile
.
# some data
df <- data.frame(name=letters , am.spent = rnorm(26))
# divide df$am.spent
df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent),
labels=1:4, include.lowest=TRUE)
# check ranges
tapply(df$am.spent , df$qnt , range)
First get the quantile
quantile(df$am.spent)
# 0% 25% 50% 75% 100%
#-3.5888426 -0.6879445 -0.1461107 0.5835165 1.2030989
Then use cut
to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the breaks
argument
Upvotes: 5