Reputation: 937
I have a data.frame with many rows and few columns. I have grouped them and now I want to select rows in specific groups that should be having values less than 1st quartile for 2 columns of that groups. Below data:
df:
ID SD_1 SD_2 clust
4 1.613479812 2.231100475 1
6 2.348970134 4.509710677 1
7 676.6791703 855.1300148 1
8 5.702718972 9.789694982 1
17 0.69905969 1.736578132 1
18 45.94251574 32.40374486 2
20 6.655940714 6.602647859 2
21 0.367147263 0.447369751 2
22 4.316702479 6.618716644 2
25 7.481365283 7.955022446 2
32 14.916817 71.70158686 2
33 0.311656121 0.947110959 2
34 0.555539595 0.438893998 2
36 2.754111181 5.586499991 2
42 8.718620333 12.50393499 3
2 17.04906625 7.825923801 3
3 9.337794688 2.805759945 3
9 3.028141567 4.965291633 3
39 0.770520551 0.676955176 3
55 8.765592871 6.058640263 3
67 0.863034955 1.150017033 3
Above is a kind of data I need to subset the rows for each clust values column so groupby them and subset rows with values less than equal to 1st quartile of df$SD_1
and df$SD2
.
Is there a function in R or a package that can do that? I used tapply()
to find the 1st quartile for the rows by grouping them with the clust column but now I want to filter those rows for each clust values below the 1st quartile for df$SD1 and df$SD2. I am sure there is some one liner in R but am not able to accomplish that.
Even if it's not a one-liner then how should I achieve it in R.
Upvotes: 0
Views: 328
Reputation: 2448
With data.table
you can do something like this:
require(data.table)
setDT(df)
df_sub <- df[, c("QSD_1", "QSD_2") := lapply(.SD, quantile, probs = .25),
by = group, .SDcols = c("SD_1", "SD_2")][SD_1 <= QSD_1 & SD_2 <= QSD_2]
Upvotes: 1