Reputation: 181
My data are as follows:
group begin.date end.date detect yearwk_detect
A 2019-07-22 2019-07-29 2 201930
A 2019-07-29 2019-08-03 5 201931
A 2019-08-03 2019-08-09 0 201932
A 2019-08-18 2019-08-30 8 201934
B 2019-07-23 2019-07-30 8 201930
B 2019-07-30 2019-08-04 1 201931
B 2019-08-06 2019-08-15 9 201932
B 2019-08-19 2019-08-29 6 201935
begin.date = date data collection began
end.date = date data collection ended
detect = number of detections
yearwk_detect = the year/isoweek in which those detections occurred
If the end.date is not equal to the begin.date in the subsequent row within each group, it means that there is a temporal gap in data collection.
For each group, and within each yearwk_detect, I would like to know how many days data collection occurred. My temporal unit of analysis is yearwk_detect, however the data collection periods (i.e., the range between begin.date and end.date) are rarely at that year-week unit of analysis. My intent is therefore to identify how many days per every year-week (per group) did data collection occur (or not occur).
Edit/addition: If a temporal gap is identified as a whole year-week, and there is no current row for that year-week, add a row for it and indicate 'temporal gap' under the 'detect' column.
EDIT
The purpose of this edit is to clarify what I am looking for. My data are as follows:
df <- read_table("begin.date end.date
2019-07-22 2019-07-29
2019-07-29 2019-08-03
2019-08-25 2019-08-30
2019-08-30 2019-09-24
2019-09-30 2019-10-05")
I would like to assign two new columns:
isoweek_id = every isoweek in the year (so there will be one row for every week in the year)
data_days = the number of days data collection occurred within that isoweek given the begin.date and end.date, which represent date ranges when data collection occurred.
We might, therefore, have weeks when the number of days data collection occurred is 0 if, for example, a temporal gap in data collection spanned more than one isoweek. (note: my real data have several years worth of data collection).
My desired output would look something like this:
begin.date end.date isoweek_id data_days
NA NA 29 0
2019-07-22 2019-07-29 30 7
2019-07-29 2019-08-03 31 6
NA NA 32 0
NA NA 33 0
2019-08-25 2019-08-30 34 1
2019-08-25 2019-08-30 35 5
2019-08-30 2019-09-24 36 7
2019-08-30 2019-09-24 37 7
2019-08-30 2019-09-24 38 7
2019-08-30 2019-09-24 39 2
2019-09-30 2019-10-05 40 6
NA NA 41 0
NA NA 42 0
NA NA 43 0
You can look at which isoweeks span which dates as follows:
library(ISOweek)
w <- paste("2019-W35", 1:7, sep = "-")
data.frame(weekdate = w, date = ISOweek2date(w))
Thank you in advance.
Upvotes: 0
Views: 59
Reputation: 6583
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
df <- read_table("begin.date end.date
2019-07-22 2019-07-29
2019-07-29 2019-08-03
2019-08-25 2019-08-30
2019-08-30 2019-09-24
2019-09-30 2019-10-05") %>%
mutate(weeks = isoweek(begin.date)) %>%
left_join(tibble(weeks = seq(as.Date("2022-01-01"),
as.Date("2022-12-31"), by = "day") %>%
isoweek() %>%
unique), . )
#> Joining, by = "weeks"
df %>%
filter(weeks >= 29) %>%
group_by(weeks) %>%
mutate(data_days = end.date - begin.date)
#> # A tibble: 24 x 4
#> # Groups: weeks [24]
#> weeks begin.date end.date data_days
#> <dbl> <date> <date> <drtn>
#> 1 52 NA NA NA days
#> 2 29 NA NA NA days
#> 3 30 2019-07-22 2019-07-29 7 days
#> 4 31 2019-07-29 2019-08-03 5 days
#> 5 32 NA NA NA days
#> 6 33 NA NA NA days
#> 7 34 2019-08-25 2019-08-30 5 days
#> 8 35 2019-08-30 2019-09-24 25 days
#> 9 36 NA NA NA days
#> 10 37 NA NA NA days
#> # ... with 14 more rows
Created on 2022-06-28 by the reprex package (v2.0.1)
Upvotes: 1