screechOwl
screechOwl

Reputation: 28159

How to create a column with a quartile rank?

I have a table called tableOne in R like this:

idNum        binaryVariable        salePrice
2               1                    55.56
4               0                    88.33
15              0                     4.45
87              1                    35.77
...            ...                    ...

I'd like to take the values produced from: summary(tableOne$salePrice) to create four quartiles by salePrice. I'd then like to create a column tableOne$quartile with which quartile each rows salePrice is in. It would look like:

idNum        binaryVariable            salePrice      quartile
    2               1                    55.56            3
    4               0                    88.33            4
    15              0                     4.45            1
    87              1                    35.77            2 
    ...            ...                    ...            ...  

Any suggestions?

Upvotes: 30

Views: 47950

Answers (7)

JDG
JDG

Reputation: 1364

The following code creates an ntile group vector:

qgroup = function(numvec, n = 4){

    qtile = quantile(numvec, probs = seq(0, 1, 1/n))
    out = sapply(numvec, function(x) sum(x >= qtile[-(n+1)]))

    return(out)
}

Upvotes: 0

Derwin Brennan
Derwin Brennan

Reputation: 484

With dplyr you could use the ntile function:

ntile(x, n)


tableOne$quartile <- ntile(tableOne$salesPrice, 4)

This will add a column to the table assigning a quantile based on n to each row with the price quantile it is in.

Note: This method starts with the lower values at 1 and works upwards from there.

Upvotes: 9

moodymudskipper
moodymudskipper

Reputation: 47320

using package cutr we can do :

# devtools::install_github("moodymudskipper/cutr")
library(cutr)
df$quartile <- smart_cut(df$salePrice, 4, "g", output = "numeric")
#   idNum binaryVariable salePrice quartile
# 1     2              1     55.56        3
# 2     4              0     88.33        4
# 3    15              0      4.45        1
# 4    87              1     35.77        2

Upvotes: 0

Akash Sharma
Akash Sharma

Reputation: 47

You can use the following script

tableOne$Quartile<-ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.25)),1,
                           ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.5)),2,
                                  ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.75)),3,
                                         ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(1)),4,NA))))

Upvotes: 0

usct01
usct01

Reputation: 898

A data.table approach

    library(data.table)
    tableOne <- setDT(tableOne)[, quartile := cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE)]

Upvotes: 9

ddiez
ddiez

Reputation: 1127

Setting the parameter labels=FALSE in cut() returns category names as integers. See ?cut

tableOne <- within(tableOne, quartile <- cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE))

Upvotes: 6

Tommy
Tommy

Reputation: 40821

This should do it:

tableOne <- within(tableOne, quartile <- as.integer(cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE)))

...Some details:

The within function is great for calculating new columns. You don't have to refer to columns as tableOne$salesPrice etc.

tableOne <- within(tableOne, quartile <- <<<some expression>>>)

The quantile function calculates the quantiles (or in your case, quartiles). 0:4/4 evaluates to c(0, 0.25, 0.50, 0.75, 1).

Finally the cut function splits your data into those quartiles. But you get a factor with weird names, so as.integer turns it into groups 1,2,3,4.

Try ?within etc to learn more about the functions mentioned here...

Upvotes: 57

Related Questions