ivivek_ngs
ivivek_ngs

Reputation: 937

how to filter rows in dataframe for specific groups having values less than 1st quartile for 2 columns in R?

I have a data.frame with many rows and few columns. I have grouped them and now I want to select rows in specific groups that should be having values less than 1st quartile for 2 columns of that groups. Below data:

df:


ID  SD_1        SD_2        clust
4   1.613479812 2.231100475 1
6   2.348970134 4.509710677 1
7   676.6791703 855.1300148 1
8   5.702718972 9.789694982 1
17  0.69905969  1.736578132 1
18  45.94251574 32.40374486 2
20  6.655940714 6.602647859 2
21  0.367147263 0.447369751 2
22  4.316702479 6.618716644 2
25  7.481365283 7.955022446 2
32  14.916817   71.70158686 2
33  0.311656121 0.947110959 2
34  0.555539595 0.438893998 2
36  2.754111181 5.586499991 2
42  8.718620333 12.50393499 3
2   17.04906625 7.825923801 3
3   9.337794688 2.805759945 3
9   3.028141567 4.965291633 3
39  0.770520551 0.676955176 3
55  8.765592871 6.058640263 3
67  0.863034955 1.150017033 3

Above is a kind of data I need to subset the rows for each clust values column so groupby them and subset rows with values less than equal to 1st quartile of df$SD_1 and df$SD2.

Is there a function in R or a package that can do that? I used tapply() to find the 1st quartile for the rows by grouping them with the clust column but now I want to filter those rows for each clust values below the 1st quartile for df$SD1 and df$SD2. I am sure there is some one liner in R but am not able to accomplish that. Even if it's not a one-liner then how should I achieve it in R.

Upvotes: 0

Views: 328

Answers (1)

amatsuo_net
amatsuo_net

Reputation: 2448

With data.table you can do something like this:

require(data.table)
setDT(df)
df_sub <- df[,  c("QSD_1", "QSD_2") := lapply(.SD, quantile, probs = .25), 
   by = group, .SDcols = c("SD_1", "SD_2")][SD_1 <= QSD_1 & SD_2 <= QSD_2]

Upvotes: 1

Related Questions