Reputation: 11
I have a dataframe that looks like this.
df <- data.frame (ptid = c(1,1,1,1, 1, 2,2,2,3,3,3, 3),
labid = c("CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE","CRE", "CRE", "CRE"),
age = c(50, 54, 50.7, 51.3, 51, 52, 35, 37, 46, 46.1, 46.1, 46.1))
Within the same participant (same ptid), I would like to keep only rows with the age only if ages are within 2.0 years.
This is what I want my result to look like:
result <- data.frame(ptid = c(1,1,2,2,3),
labid = c("CRE", "CRE", "CRE", "CRE", "CRE"),
age = c(50,54,52,35,46))
Thank you in advance for your help! I've really been struggling with this one!
Upvotes: 1
Views: 70
Reputation: 72813
Define a function f
that either cut
s the input into intervals and identifies duplicates, or gives back smallest element if it fails. Then, apply f
on each ptid
level using ave
with a negation !
, where you get the number of intervals to cut
into by dividing the difference of maximum and minimum age
by 2.
f <- function(x, ...) tryCatch(duplicated(cut(x, ...)), error=function(e) order(x) > 1)
res <- subset(df, !ave(age, ptid, FUN=function(x) f(x, diff(range(x)) / 2)))
res
# ptid labid age
# 1 1 CRE 50
# 2 1 CRE 54
# 6 2 CRE 52
# 7 2 CRE 35
# 9 3 CRE 46
Notes: 1. The order of observations won't get confused. 2. The solution removes duplicates aka ties, i.e. if there were more ptid-labid with the same age. (If this should not be desired for any reason, look into rank()
instead of order()
.)
Upvotes: 0
Reputation: 79228
You could do:
df %>%
group_by(ptid)%>%
arrange(ptid, age) %>%
mutate(grp = cumsum(cumsum(c(0, diff(age)))>2))%>%
group_by(ptid, grp)%>%
slice(1) %>%
ungroup()%>%
select(-grp)
# A tibble: 5 x 3
ptid labid age
<dbl> <chr> <dbl>
1 1 CRE 50
2 1 CRE 54
3 2 CRE 35
4 2 CRE 52
5 3 CRE 46
Upvotes: 1
Reputation: 887118
We could do an arrange
and use diff
in filter
library(dplyr)
df %>%
arrange(ptid, age) %>%
group_by(ptid) %>%
filter(c(first(age), diff(age)) > 2) %>%
ungroup
-output
# A tibble: 5 x 3
# ptid labid age
# <dbl> <chr> <dbl>
#1 1 CRE 50
#2 1 CRE 54
#3 2 CRE 35
#4 2 CRE 52
#5 3 CRE 46
Upvotes: 1