Reputation: 412
I have some data containing information on the prices consumers are willing to pay for certain services. I'm trying to find the deciles each response falls into, for several services by using the cut function.
for (i in 2:13){
x<-quantile(data1[,i],c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1),na.rm=TRUE)
data1[paste(names(data1[i]), "deciles", sep="_")] <- cut(data1[,i], breaks=x, includ)
}
However, I have two problems: there are some variables for which two deciles are the same value (e.g. 0 =0, .1=0), which the cut function will not accept. Also, for the initial columns where the code does work, I get the actual decile and not the decile number (for example "(1.99,2.56]" instead of .2.
If anyone has any ideas, I would greatly appreciate it.
Upvotes: 3
Views: 632
Reputation: 32426
For the first problem: you can only use the unique
breaks and pass those to cut
. For the second, convert the factor to an integer and use the integer as the index in the probs
vector to pull out the appropriate quantile break.
## Some sample data, the third column will fail for `cut`
set.seed(0)
data1 <- data.frame(x=rnorm(100), y=rnorm(100), z=sample(0:5, 100, rep=T))
qs <- seq(0, 1, by=0.1) # probs for quantile
for (i in 1:3){
x <- quantile(data1[,i], qs, na.rm=TRUE)
used <- qs[which(diff(c(0, x)) > 0)] # which quantiles worked
cuts <- cut(data1[,i], breaks=unique(x), include=T) # factors as you had them
data1[paste(names(data1[i]), "deciles", sep="_")] <- cuts
data1[paste(names(data1[i]), "num", sep="_")] <- used[as.integer(cuts)] # numeric values
}
# x y z x_deciles x_num y_deciles y_num z_deciles
# 1 1.2629543 0.7818592 0 (1.24,2.44] 1.0 (0.78,1.5] 0.9 [0,1.7]
# 2 -0.3262334 -0.7767766 3 (-0.421,-0.252] 0.4 (-0.956,-0.714] 0.3 (2,3]
# 3 1.3297993 -0.6159899 1 (1.24,2.44] 1.0 (-0.714,-0.459] 0.4 [0,1.7]
# 4 1.2724293 0.0465803 5 (1.24,2.44] 1.0 (0.0262,0.376] 0.7 (4,5]
# 5 0.4146414 -1.1303858 5 (0.234,0.421] 0.7 [-1.68,-1.12] 0.1 (4,5]
# 6 -1.5399500 0.5767188 5 [-2.22,-1.07] 0.1 (0.376,0.78] 0.8 (4,5]
# z_num
# 1 0.3
# 2 0.6
# 3 0.3
# 4 0.8
# 5 0.8
# 6 0.8
Upvotes: 1