Quantile results for the entire dataframe

Question

I have a fairly large data set consisting of around 100 variables and around 1 million observations. The data set contains both numeric and categorical variables. I want to calculate the quantile for all the numeric variables, so when I try the following: quantile(dat1, c(.10, .30, .5, .75, .9, na.rm = TRUE)

I get an error in R saying "non-numeric argument to binary operator"

So could anyone please suggest me the appropriate codes for this? Appreciate all your help and thanks

Sathish · Accepted Answer

Quantile of all numeric columns

# sample data with numeric and character class values 
df <- data.frame(a = 1:5, b= 1:5, c = letters[1:5])
col_numeric <- which( sapply(df, is.numeric ) )   # get numeric column indices
quantile( x = unlist( df[,  col_numeric] ), 
          c(.10, .30, .5, .75, .9),
          na.rm = TRUE )

# 10% 30% 50% 75% 90% 
#  1   2   3   4   5

Quantile of individual numeric column

sapply( col_numeric, function( y ) {
  quantile( x = unlist( df[,  y ] ), 
            c(.10, .30, .5, .75, .9),
            na.rm = TRUE )
})

#       a   b
# 10% 1.4 1.4
# 30% 2.2 2.2
# 50% 3.0 3.0
# 75% 4.0 4.0
# 90% 4.6 4.6

Since your real data is big, you could use data.table library for efficiency.

library('data.table')
setDT(df)[, lapply( .SD, quantile, probs = c(.10, .30, .5, .75, .9), na.rm = TRUE ), .SDcols = col_numeric ]

Quantile results for the entire dataframe

Answers (1)

Related Questions