Reputation: 35
I am trying to count the number of rows in df1, which contains the date of an event,
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
that fall within the start and end dates of df2,
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
In this example, the output would look like
Id Start End Count
1 1 2021-06-01 2021-08-15 3
2 2 2021-08-01 2021-09-15 3
I have tried similar examples How to get the number of counts between two dates in R? and count row if date falls within date range for all dates in series in R without any success.
Any help or suggestions would be greatly appreciated. Thank you!
Upvotes: 3
Views: 884
Reputation: 7277
Or with base
:
df2$Count <- apply(df2, 1, function(x) sum(as.Date(df1$date) %in% seq(as.Date(x["Start"]), as.Date(x["End"]), by = "1 day")))
Output:
Id Start End Count
1 1 2021-06-01 2021-08-15 4
2 2 2021-08-01 2021-09-15 3
Upvotes: 1
Reputation: 41225
Please note: should Id 1 count be 4 in your expected output?
You can group_by
your data and sum the dates that fall %within%
the interval
like this:
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
library(dplyr)
library(lubridate)
df2 %>%
group_by(Id) %>%
mutate(Count = sum(as_date(df1$date) %within% lubridate::interval(Start, End)))
#> # A tibble: 2 × 4
#> # Groups: Id [2]
#> Id Start End Count
#> <dbl> <chr> <chr> <int>
#> 1 1 2021-06-01 2021-08-15 4
#> 2 2 2021-08-01 2021-09-15 3
Created on 2022-07-12 by the reprex package (v2.0.1)
Upvotes: 8
Reputation: 72613
Using data.table::between
in outer
.
f <- Vectorize(\(i, j) data.table::between(df1[i, 1L], df2[j, 2], df2[j, 3]))
transform(df2, count=colSums(outer(seq_len(nrow(df1)), seq_len(nrow(df2)), f)))
# Id Start End count
# 1 1 2021-06-01 2021-08-15 4
# 2 2 2021-08-01 2021-09-15 3
Note, that "Date"
format is required, so you may want to do this beforehand:
df1[] <- lapply(df1, as.Date)
df2[-1] <- lapply(df2[-1], as.Date)
Data:
df1 <- structure(list(date = structure(c(18839, 18840, 18851, 18853), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")
df2 <- structure(list(Id = c(1, 2), Start = structure(c(18779, 18840
), class = "Date"), End = structure(c(18854, 18885), class = "Date")), row.names = c(NA,
-2L), class = "data.frame")
Upvotes: 1