Reputation: 28159
I have a table called tableOne in R like this:
idNum binaryVariable salePrice
2 1 55.56
4 0 88.33
15 0 4.45
87 1 35.77
... ... ...
I'd like to take the values produced from: summary(tableOne$salePrice) to create four quartiles by salePrice. I'd then like to create a column tableOne$quartile with which quartile each rows salePrice is in. It would look like:
idNum binaryVariable salePrice quartile
2 1 55.56 3
4 0 88.33 4
15 0 4.45 1
87 1 35.77 2
... ... ... ...
Any suggestions?
Upvotes: 30
Views: 47950
Reputation: 1364
The following code creates an ntile group vector:
qgroup = function(numvec, n = 4){
qtile = quantile(numvec, probs = seq(0, 1, 1/n))
out = sapply(numvec, function(x) sum(x >= qtile[-(n+1)]))
return(out)
}
Upvotes: 0
Reputation: 484
With dplyr you could use the ntile function:
ntile(x, n)
tableOne$quartile <- ntile(tableOne$salesPrice, 4)
This will add a column to the table assigning a quantile based on n to each row with the price quantile it is in.
Note: This method starts with the lower values at 1 and works upwards from there.
Upvotes: 9
Reputation: 47320
using package cutr
we can do :
# devtools::install_github("moodymudskipper/cutr")
library(cutr)
df$quartile <- smart_cut(df$salePrice, 4, "g", output = "numeric")
# idNum binaryVariable salePrice quartile
# 1 2 1 55.56 3
# 2 4 0 88.33 4
# 3 15 0 4.45 1
# 4 87 1 35.77 2
Upvotes: 0
Reputation: 47
You can use the following script
tableOne$Quartile<-ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.25)),1,
ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.5)),2,
ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(0.75)),3,
ifelse(tableOne$salesPrice<=quantile(tableOne$salesPrice,c(1)),4,NA))))
Upvotes: 0
Reputation: 898
A data.table approach
library(data.table)
tableOne <- setDT(tableOne)[, quartile := cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE)]
Upvotes: 9
Reputation: 1127
Setting the parameter labels=FALSE
in cut()
returns category names as integers. See ?cut
tableOne <- within(tableOne, quartile <- cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE))
Upvotes: 6
Reputation: 40821
This should do it:
tableOne <- within(tableOne, quartile <- as.integer(cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE)))
...Some details:
The within
function is great for calculating new columns. You don't have to refer to columns as
tableOne$salesPrice
etc.
tableOne <- within(tableOne, quartile <- <<<some expression>>>)
The quantile
function calculates the quantiles (or in your case, quartiles). 0:4/4
evaluates to c(0, 0.25, 0.50, 0.75, 1)
.
Finally the cut
function splits your data into those quartiles. But you get a factor
with weird names, so as.integer
turns it into groups 1,2,3,4
.
Try ?within
etc to learn more about the functions mentioned here...
Upvotes: 57