HM8689
HM8689

Reputation: 143

quantiles in R using with an increase of 0.01 between the upper limit of the lower quantile and lower limit of the upper quantile

I have the following code in R to generate quintiles for my dataframe. However, the quintiles generated using this are- "[0.22,4.16]" "(4.16,7.15]" "(7.15,9.7]" "(9.7,19.2]" "(19.2,78.4]".

Instead I would like the levels to increment by 0.01 between the upper limit of the previous quintile and lower limit of the next quintile. So I want them to be - "[0.22,4.16]" "(4.17,7.15]" "(7.16,9.7]" "(9.8,19.2]" "(19.3,78.4]".

Any help will be much appreciated

library(dplyr)
library(gtools)

mydata <-mydata%>%
mutate(Value = ifelse(Value == -1,NA,Value),
Value = quantcut(Value, q=seq(0,1,by=0.2), na.rm=TRUE))

Upvotes: 0

Views: 398

Answers (1)

KenHBS
KenHBS

Reputation: 7164

quantcut() gives you [0.22, 4.16], (4.16,7.15], (7.15,9.7], (9.7,19.2] and (19.2,78.4]. All possible values in your range are covered by this way of cutting the intervals into quintiles.

You want to have: [0.22, 4.16], (4.17,7.15], (7.16,9.7], (9.71,19.2] and (19.21,78.4]. This fails to account for all values that are 0.1 above the quintile borders. Like this, the number 4.17 does not fall into the first interval and it also excluded from the second interval, because of the open left border on all intervals. The same applies for 7.16, 9.71 and 19.21.

Having said that, let's assume you have a very strong reasoning to justify your choice.

You will have to first adapt the old values to the new values and then change the old values to the new values. If you use mapvalues() from the plyr package for this, you will not have to manually mess around with adding levels for your factors etc:

library(plyr)
mydata$quants <- quantcut(mydata$Value, q = seq(0, 1, by=0.2), na.rm=TRUE)

# Step 1: Adapt old values to new values with regular expressions:
old_vals <- levels(mydata$quants)[-1]

regs <- gregexpr("(?<=\\()(.*)(?=,)", levels(mydata$quants), perl=TRUE)
repl <- as.numeric(regmatches(levels(mydata$quants), regs))[-1] + 0.1

new_vals <- mapply(gsub, replacement=repl, x=old_vals, 
              MoreArgs = list(pattern = "(?<=\\()(.*)(?=,)", perl=TRUE))

# Step 2:
mydata$quants <- mapvalues(mydata$quants, from=old_vals, to=new_vals)

Upvotes: 1

Related Questions