Reputation: 111
I need to know how to filter a dataframe so that only the results belonging to quantile 3 (Q3, 0.75) appear in some specific columns. I will try to explain myself. I have the following dataframe:
https://drive.google.com/file/d/1blYWBXCrXpH37Wz4r0mVJGbwFsdesGi-/view?usp=sharing
I need the code to returns a table with all the columns, and with all the rows that meet the condition of being in Q3 (0.75) of the following columns:
educ, salario, salini, tiempemp, expprev
Any ideas? Thanks to everyone beforehand!
I have temporarily resolved the issue by calculating the quantiles manually and doing conditional filtering as I show below. Would there be any way to improve this solution?:
quantile(empleados$educ, .75)
quantile(empleados$salario, .75)
quantile(empleados$salini, .75)
quantile(empleados$tiempemp, .75)
quantile(empleados$expprev, .75)
data.frame(empleados)
arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
ok <- arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
View(ok)
Upvotes: 1
Views: 1546
Reputation: 6567
A version that uses base R
# downloaded data file located here...
df <- read.csv('~/Downloads/Empleados.dat', sep = '\t')
numerics <- c("educ", "salario", "salini", "tiempemp", "expprev")
quantiles <- sapply(numerics, function(n) quantile(df[,n])[4])
quantilenames <- names(quantiles)
comparison <- data.frame(mapply(function(x,y) df[,y] >= quantiles[x], quantilenames, numerics))
comparison$alltrue <- apply(comparison, MARGIN = 1, all)
df.1 <- cbind(df, comparison)
df.1[df.1$alltrue,]
# id sexo fechnac educ catlab salario salini tiempemp expprev educ.75. salario.75. salini.75. tiempemp.75. expprev.75. alltrue
#6 11 2 2/7/1950 16 1 30300 16500 98 143 TRUE TRUE TRUE TRUE TRUE TRUE
#7 14 2 2/26/1949 15 1 35100 16800 98 137 TRUE TRUE TRUE TRUE TRUE TRUE
#21 74 2 4/28/1933 15 1 33900 19500 93 192 TRUE TRUE TRUE TRUE TRUE TRUE
#50 134 2 11/10/1941 16 3 41550 24990 89 285 TRUE TRUE TRUE TRUE TRUE TRUE
Upvotes: 1
Reputation: 389235
We can use mutate_at
over specific columns and then use filter_at
to select rows where all the values are satisfied.
library(dplyr)
cols <- c("educ", "salario", "salini", "tiempemp", "expprev")
Empleados %>%
mutate_at(cols, list(col = ~. > quantile(., 0.75))) %>%
filter_at(vars(ends_with('col')), all_vars(.)) %>%
select(-ends_with('col'))
# id sexo fechnac educ catlab salario salini tiempemp expprev
#1 11 2 2/7/1950 16 1 30300 16500 98 143
#2 134 2 11/10/1941 16 3 41550 24990 89 285
Upvotes: 1
Reputation: 111
I have temporarily resolved the issue by calculating the quantiles manually and doing conditional filtering as I show below. Would there be any way to improve this solution?
quantile(empleados$educ, .75)
quantile(empleados$salario, .75)
quantile(empleados$salini, .75)
quantile(empleados$tiempemp, .75)
quantile(empleados$expprev, .75)
data.frame(empleados)
arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
ok <- arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
View(ok)
Upvotes: 0