christine
christine

Reputation: 11

Filter lowest value within range in r

I have a dataframe that looks like this.

df <- data.frame (ptid  = c(1,1,1,1, 1, 2,2,2,3,3,3, 3),
              labid = c("CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE","CRE", "CRE", "CRE"),
              age = c(50, 54, 50.7,  51.3, 51, 52, 35, 37, 46, 46.1, 46.1, 46.1))

Within the same participant (same ptid), I would like to keep only rows with the age only if ages are within 2.0 years.

This is what I want my result to look like:

result <- data.frame(ptid = c(1,1,2,2,3),
                     labid = c("CRE", "CRE", "CRE", "CRE", "CRE"),
                     age = c(50,54,52,35,46))

Thank you in advance for your help! I've really been struggling with this one!

Upvotes: 1

Views: 70

Answers (3)

jay.sf
jay.sf

Reputation: 72813

Define a function f that either cuts the input into intervals and identifies duplicates, or gives back smallest element if it fails. Then, apply f on each ptid level using ave with a negation !, where you get the number of intervals to cut into by dividing the difference of maximum and minimum age by 2.

f <- function(x, ...) tryCatch(duplicated(cut(x, ...)), error=function(e) order(x) > 1)

res <- subset(df, !ave(age, ptid, FUN=function(x) f(x, diff(range(x)) / 2)))
res
#   ptid labid age
# 1    1   CRE  50
# 2    1   CRE  54
# 6    2   CRE  52
# 7    2   CRE  35
# 9    3   CRE  46

Notes: 1. The order of observations won't get confused. 2. The solution removes duplicates aka ties, i.e. if there were more ptid-labid with the same age. (If this should not be desired for any reason, look into rank() instead of order().)

Upvotes: 0

Onyambu
Onyambu

Reputation: 79228

You could do:

df %>%
  group_by(ptid)%>%
  arrange(ptid, age) %>%
  mutate(grp = cumsum(cumsum(c(0, diff(age)))>2))%>%
  group_by(ptid, grp)%>%
  slice(1) %>%
  ungroup()%>%
  select(-grp)
# A tibble: 5 x 3
   ptid labid   age
  <dbl> <chr> <dbl>
1     1 CRE      50
2     1 CRE      54
3     2 CRE      35
4     2 CRE      52
5     3 CRE      46

Upvotes: 1

akrun
akrun

Reputation: 887118

We could do an arrange and use diff in filter

library(dplyr)
df %>%
   arrange(ptid, age) %>% 
   group_by(ptid) %>% 
   filter(c(first(age), diff(age)) > 2) %>%
   ungroup

-output

# A tibble: 5 x 3
#   ptid labid   age
#  <dbl> <chr> <dbl>
#1     1 CRE      50
#2     1 CRE      54
#3     2 CRE      35
#4     2 CRE      52
#5     3 CRE      46

Upvotes: 1

Related Questions