Reputation: 57
I have a dataset which contains several IDs and sample date column, like
dataframe <- data.frame(ID=c("ID1","ID2","ID3","ID4", "ID2", "ID2", "ID3","ID4", "ID5","ID1"),
sample_date=c(1991-05-23, 1991-05-24,1991-05-24, 1991-05-26,1991-05-27,1991-05-28,1991-05-30,1991-05-31, 1991-06-03, 1991-06-03),
sex =c(1,2,1,2,2,2,1,2,1,1), and_so_om =c(1))
and then I want to sort by the same ID and detect if the same ID has very close sample_date(e.g. within 3days).
First of all, I tried to sort the data frame by ID, and then I got the following results,
outcome <- dataframe %>% select(ID,sample_date) %>% count(ID,sample_date)
From here, I don't know how to calculate the difference of days of sample_date within the same ID.
Upvotes: 1
Views: 220
Reputation: 101916
I think aggregate
from base R is enough to make it
dfout <- aggregate(sample_date~ID,dataframe,function(x) min(diff(sort(x)),Inf)<=3)
such that
> dfout
ID sample_date
1 ID1 FALSE
2 ID2 TRUE
3 ID3 FALSE
4 ID4 FALSE
5 ID5 FALSE
Upvotes: 1
Reputation: 389065
Perhaps, you can try
library(dplyr)
n <- 3
dataframe %>%
mutate(sample_date = as.Date(sample_date)) %>%
arrange(ID, sample_date) %>%
group_by(ID) %>%
summarise(is_closest = any(diff(sample_date) <= n))
which gives
# ID is_close
# <fct> <lgl>
#1 ID1 FALSE
#2 ID2 TRUE
#3 ID3 FALSE
#4 ID4 FALSE
#5 ID5 FALSE
This checks for each ID
of there is any
sample_date
within n
days than the previous one.
data
dataframe <- data.frame(ID=c("ID1","ID2","ID3","ID4","ID2","ID2","ID3","ID4",
"ID5","ID1"), sample_date=c("1991-05-23", "1991-05-24","1991-05-24",
"1991-05-26", "1991-05-27","1991-05-28","1991-05-30","1991-05-31",
"1991-06-03", "1991-06-03"), sex =c(1,2,1,2,2,2,1,2,1,1), and_so_om = 1)
Upvotes: 2