wertisml
wertisml

Reputation: 35

Count the number of entries that fall within a range of dates in a separate dataframe in R

I am trying to count the number of rows in df1, which contains the date of an event,

df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))

that fall within the start and end dates of df2,

df2 = data.frame(Id = c(1,2), 
                 Start = c("2021-06-01", "2021-08-01"), 
                 End = c("2021-08-15", "2021-09-15"))

In this example, the output would look like

  Id      Start        End Count
1  1 2021-06-01 2021-08-15     3
2  2 2021-08-01 2021-09-15     3

I have tried similar examples How to get the number of counts between two dates in R? and count row if date falls within date range for all dates in series in R without any success.

Any help or suggestions would be greatly appreciated. Thank you!

Upvotes: 3

Views: 884

Answers (3)

harre
harre

Reputation: 7277

Or with base:

df2$Count <- apply(df2, 1, function(x) sum(as.Date(df1$date) %in% seq(as.Date(x["Start"]), as.Date(x["End"]), by = "1 day")))

Output:

  Id      Start        End Count
1  1 2021-06-01 2021-08-15     4
2  2 2021-08-01 2021-09-15     3

Upvotes: 1

Quinten
Quinten

Reputation: 41225

Please note: should Id 1 count be 4 in your expected output?

You can group_by your data and sum the dates that fall %within% the interval like this:

df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
df2 = data.frame(Id = c(1,2), 
                 Start = c("2021-06-01", "2021-08-01"), 
                 End = c("2021-08-15", "2021-09-15"))

library(dplyr)
library(lubridate)
df2 %>%
  group_by(Id) %>%
  mutate(Count = sum(as_date(df1$date) %within% lubridate::interval(Start, End)))
#> # A tibble: 2 × 4
#> # Groups:   Id [2]
#>      Id Start      End        Count
#>   <dbl> <chr>      <chr>      <int>
#> 1     1 2021-06-01 2021-08-15     4
#> 2     2 2021-08-01 2021-09-15     3

Created on 2022-07-12 by the reprex package (v2.0.1)

Upvotes: 8

jay.sf
jay.sf

Reputation: 72613

Using data.table::between in outer.

f <- Vectorize(\(i, j) data.table::between(df1[i, 1L], df2[j, 2], df2[j, 3]))
transform(df2, count=colSums(outer(seq_len(nrow(df1)), seq_len(nrow(df2)), f)))
#   Id      Start        End count
# 1  1 2021-06-01 2021-08-15     4
# 2  2 2021-08-01 2021-09-15     3

Note, that "Date" format is required, so you may want to do this beforehand:

df1[] <- lapply(df1, as.Date)
df2[-1] <- lapply(df2[-1], as.Date)

Data:

df1 <- structure(list(date = structure(c(18839, 18840, 18851, 18853), class = "Date")), row.names = c(NA, 
-4L), class = "data.frame")

df2 <- structure(list(Id = c(1, 2), Start = structure(c(18779, 18840
), class = "Date"), End = structure(c(18854, 18885), class = "Date")), row.names = c(NA, 
-2L), class = "data.frame")

Upvotes: 1

Related Questions