Sharath
Sharath

Reputation: 2267

Subset a column using a threshold value by grouping the ID column

I have a df that was obtained from this (For eg.)

ID <- c("A","A","A","A","A","B","B","B","B","B") 
Point_A <- c(10,25,30,40,50,60,75,80,90,100) 
Point_B <- c(21,32,43,54,65,11,22,53,94,15)
df1 <- data.frame(ID,Point_A,Point_B)

I want to subset the dataframe by taking the values < threshold (Here Threshold = Group median of Point_A column - 7.5).

I am currently sub setting by taking the median of the entire column and subtracting the 7.5.

df2 <- subset(df1, df1$Point_A < median(Point_A) - 7.5)

But, I want to take the group medians (Medians for each ID. Here it is A,B) and then subtract 7.5 and do the subsetting.

Desired Output

ID  Point_A  Point_B 
A      10      21
B      60      11

For ID A, 30 is the median and 30 -7.5 = 22.5 and so only 10 appears in the output for A. The same applies for B.

Please guide on how I would go about doing this.

Upvotes: 1

Views: 781

Answers (1)

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

Try

library(dplyr)
df1 %>% group_by(ID) %>% filter(Point_A < median(Point_A) - 7.5,
                                Point_B < median(Point_B) - 7.5)

Or, as per @Frank suggestion in the comments:

mycond <- function(x) x < median(x) - 7.5 
df1 %>% group_by(ID) %>% filter(mycond(Point_A), mycond(Point_B))

Which gives:

#Source: local data frame [2 x 3]
#Groups: ID
#
#  ID Point_A Point_B
#1  A      10      21
#2  B      60      11

Edit

I may have misinterpreted your initial question. If you only want to filter for Point_A do:

df %>% group_by(ID) %>% filter(Point_A < median(Point_A) - 7.5)

Upvotes: 2

Related Questions