Reputation: 539
There is a date column for different ID group and each observations has a number of days to be added number.
library("data.table")
data <- data.table(ID = c(1,1,2,2,3,3,3),
Date =c("01/Sep/2020","11/Sep/2020", "01/Sep/2020",
"08/Sep/2020", "01/Aug/2020", "04/Aug/2020",
"10/Aug/2020"),
days_to_be_added = c(10,10,10,08,05,05,30))
data[, Date := as.Date(Date, format = "%d/%h/%Y")]
ID Date days_to_be_added
1: 1 2020-09-01 10
2: 1 2020-09-11 10
3: 2 2020-09-01 10
4: 2 2020-09-08 8
5: 3 2020-08-01 5
6: 3 2020-08-04 5
7: 3 2020-08-10 30
I have to get the date interval for each ID group such that each date is added to the 'days_to_be_added_group' and number of days is counted between them. If any date overlaps then they are counted only once.
Example : For ID 2 :
3rd row : **1 Sep 2020** to **10 Sep 2020** is 10 days [as Days_to_be_added is 10]
4th row : **8 Sep 2020** to **15 Sep 2020** is 8 days [as Days to be added is 8]
But the total number of days for ID 2 should come as **15 days** since 8 Sep to 10 Sep is overlap for the ID group and should be counted once.
**Expected output:**
ID Number_of_days
1 20
2 15
3 38
```
**Note** If there are any **Date** as "NA" they should be ignored
Upvotes: 1
Views: 79
Reputation: 30474
Here is one approach.
Add rows for each day using seq.Date
starting with Date
for each ID
, and continuing for days_to_be_added
.
Then, the Number_of_days
would be the total number of unique day
values for each ID
, so that overlapping day
would not be double counted.
data[, .(day = seq.Date(Date, by = 'day', length.out = days_to_be_added))
, by = .(ID, 1:nrow(data))
][, .(Number_of_days = uniqueN(day)), by = ID][]
Output
ID Number_of_days
1: 1 20
2: 2 15
3: 3 38
Upvotes: 1