user3711502
user3711502

Reputation: 412

R: Trying to determine which decile each data point is in, for all variables in a data frame

I have some data containing information on the prices consumers are willing to pay for certain services. I'm trying to find the deciles each response falls into, for several services by using the cut function.

for (i in 2:13){
    x<-quantile(data1[,i],c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1),na.rm=TRUE)

    data1[paste(names(data1[i]), "deciles", sep="_")] <- cut(data1[,i], breaks=x, includ)
}

However, I have two problems: there are some variables for which two deciles are the same value (e.g. 0 =0, .1=0), which the cut function will not accept. Also, for the initial columns where the code does work, I get the actual decile and not the decile number (for example "(1.99,2.56]" instead of .2.

If anyone has any ideas, I would greatly appreciate it.

Upvotes: 3

Views: 632

Answers (1)

Rorschach
Rorschach

Reputation: 32426

For the first problem: you can only use the unique breaks and pass those to cut. For the second, convert the factor to an integer and use the integer as the index in the probs vector to pull out the appropriate quantile break.

## Some sample data, the third column will fail for `cut`
set.seed(0)
data1 <- data.frame(x=rnorm(100), y=rnorm(100), z=sample(0:5, 100, rep=T))
qs <- seq(0, 1, by=0.1)                                                      # probs for quantile
for (i in 1:3){
    x <- quantile(data1[,i], qs, na.rm=TRUE)
    used <- qs[which(diff(c(0, x)) > 0)]                                     # which quantiles worked
    cuts <- cut(data1[,i], breaks=unique(x), include=T)                      # factors as you had them
    data1[paste(names(data1[i]), "deciles", sep="_")] <- cuts
    data1[paste(names(data1[i]), "num", sep="_")] <- used[as.integer(cuts)]  # numeric values
}
#            x          y z       x_deciles x_num       y_deciles y_num z_deciles
# 1  1.2629543  0.7818592 0     (1.24,2.44]   1.0      (0.78,1.5]   0.9   [0,1.7]
# 2 -0.3262334 -0.7767766 3 (-0.421,-0.252]   0.4 (-0.956,-0.714]   0.3     (2,3]
# 3  1.3297993 -0.6159899 1     (1.24,2.44]   1.0 (-0.714,-0.459]   0.4   [0,1.7]
# 4  1.2724293  0.0465803 5     (1.24,2.44]   1.0  (0.0262,0.376]   0.7     (4,5]
# 5  0.4146414 -1.1303858 5   (0.234,0.421]   0.7   [-1.68,-1.12]   0.1     (4,5]
# 6 -1.5399500  0.5767188 5   [-2.22,-1.07]   0.1    (0.376,0.78]   0.8     (4,5]
#   z_num
# 1   0.3
# 2   0.6
# 3   0.3
# 4   0.8
# 5   0.8
# 6   0.8

Upvotes: 1

Related Questions