JH2021
JH2021

Reputation: 1

Count rows for each ID that fall between two dates, when using two tables

For each ID in one table (Table 1), how can I count the number of rows in another table (Table 2) which fall between a start date and end date (two columns in the first table) in R?

Table 1:

contact_number start_date end_date
123 01/01/2023 01/01/2024
1234 01/01/2022 01/01/2023

Table 2:

contact_number visit_type date
123 1 02/01/2023
123 2 05/01/2023
1234 2 05/01/2022

Output wanted:

contact_number start_date end_date visits
123 01/01/2023 01/01/2024 2
1234 01/01/2022 01/01/2023 1

I am wanting to replicate a Power BI DAX query:

visits_per_membership = 
CALCULATE(
    COUNTROWS('Table 2'),
    FILTER(
        'Table 2',
        'Table 2'[contact_number] = Table 1[contact_number] && 'Table 2'[EntryDate].[Date] >= Table 1[start_date].[Date] && 'Table 2'[EntryDate].[Date] < Table 1[renewal_date].[Date])
)

Upvotes: 0

Views: 321

Answers (2)

zephryl
zephryl

Reputation: 17204

Base R solution, mapping over rows of table1 and comparing to table2 by id and dates:

table1$visits <- mapply(
  \(contact, start, end) sum(
    table2$contact_number == contact & 
    table2$date >= start &
    table2$date <= end
  ),
  table1$contact_number,
  table1$start_date,
  table1$end_date
  )

table1
  contact_number start_date   end_date visits
1            123 2023-01-01 2024-01-01      2
2           1234 2022-01-01 2023-01-01      1

Or using dplyr::left_join():

library(dplyr) # >= v1.1.0:

table1 %>%
  left_join(table2, multiple = "all") %>%
  summarize(
    visits = sum(between(date, start_date, end_date)),
    .by = c(contact_number, start_date, end_date)
  )

Upvotes: 0

Wimpel
Wimpel

Reputation: 27762

data.table approach

dt1[, visits := dt2[dt1, on = .(date >= start_date, date <= end_date)][, .N, by = contact_number]$N]

   contact_number start_date   end_date visits
1:            123 2023-01-01 2024-01-01      2
2:           1234 2022-01-01 2023-01-01      1

sample data used

dt1 <- fread("contact_number    start_date  end_date
123     01/01/2023  01/01/2024
1234    01/01/2022  01/01/2023")

dt2 <- fread("contact_number    visit_type  date
123     1   02/01/2023
123     2   05/01/2023
1234    2   05/01/2022")

dt1[, c("start_date", "end_date") := lapply(.SD, as.Date, format = "%d/%m/%Y"), 
    .SDcols = c("start_date", "end_date")]
dt2[, date := as.Date(date, format = "%d/%m/%Y")]

Upvotes: 0

Related Questions