thestacker
thestacker

Reputation: 3

How do I exclude the top and bottom 5% of column in a data frame in r?

If my data frame is called "houses" and I want to exclude the top 5% and bottom 5% of the column Sale_Price, how do I do that?

houses[quantile(Sale_Price, c(.05, .95))

I tried this code, but I'm getting errors.

Upvotes: 0

Views: 1437

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

Using dplyr, we can do

library(dplyr)

houses %>% filter(between(Sale_Price, 
                  quantile(Sale_Price, 0.05), quantile(Sale_Price, 0.95)))

Or with data.table

library(data.table)

setDT(houses)
houses[Sale_Price %between% quantile(Sale_Price, c(.05, .95))]

Upvotes: 1

Here is some data that I assume is similar to what you have.

houses<-data.frame(Sale_Price=rnorm(100,50,5))

The code to stay only with the prices between the upper and lower 5 % of the Sale_Price values

#Calculate 0.05 and 0.95 percentiles
quants<-quantile(houses$Sale_Price, probs = c(0.05, 0.95))
#Subset according to the two percentiles
df1 <- houses$Sale_Price[houses$Sale_Price > quants[1] & houses$Sale_Price < quants[2]]

Upvotes: 1

Related Questions