Deb
Deb

Reputation: 539

Count Date intervals by group in R taking into account that when date overlap just count them once

There is a date column for different ID group and each observations has a number of days to be added number.

library("data.table")

data <- data.table(ID = c(1,1,2,2,3,3,3),
                   Date =c("01/Sep/2020","11/Sep/2020", "01/Sep/2020",
                           "08/Sep/2020", "01/Aug/2020", "04/Aug/2020",
                           "10/Aug/2020"),
                   days_to_be_added = c(10,10,10,08,05,05,30))

data[, Date := as.Date(Date, format = "%d/%h/%Y")]

   ID       Date      days_to_be_added
1:  1 2020-09-01               10
2:  1 2020-09-11               10
3:  2 2020-09-01               10
4:  2 2020-09-08                8
5:  3 2020-08-01                5
6:  3 2020-08-04                5
7:  3 2020-08-10               30

I have to get the date interval for each ID group such that each date is added to the 'days_to_be_added_group' and number of days is counted between them. If any date overlaps then they are counted only once.

Example : For ID 2 :

3rd row : **1 Sep 2020** to **10 Sep 2020** is 10 days [as Days_to_be_added is 10]
4th row : **8 Sep 2020** to **15 Sep 2020** is 8 days [as Days to be added is 8]
But the total number of days for ID 2 should come as **15 days** since 8 Sep to 10 Sep is overlap for the ID group and should be counted once.
**Expected output:**

ID  Number_of_days
1    20
2    15
3    38
```

**Note** If there are any **Date** as "NA" they should be ignored

Upvotes: 1

Views: 79

Answers (1)

Ben
Ben

Reputation: 30474

Here is one approach.

Add rows for each day using seq.Date starting with Date for each ID, and continuing for days_to_be_added.

Then, the Number_of_days would be the total number of unique day values for each ID, so that overlapping day would not be double counted.

data[, .(day = seq.Date(Date, by = 'day', length.out = days_to_be_added))
     , by = .(ID, 1:nrow(data))
     ][, .(Number_of_days = uniqueN(day)), by = ID][]

Output

   ID Number_of_days
1:  1             20
2:  2             15
3:  3             38

Upvotes: 1

Related Questions