Reputation: 409
I've a dataframe containing dates from 2015-2019. It looks something like this:
date | week | value
2016-01-01 | 53 | 3
2016-01-02 | 53 | 7
2016-01-03 | 53 | 1
2016-01-04 | 1 | 9
2016-01-05 | 1 | 4
2016-01-06 | 1 | 8
2016-01-07 | 1 | 4
2016-01-08 | 1 | 5
2016-01-09 | 1 | 6
2016-01-10 | 1 | 20
.
.
.
2016-12-31 | 52 | 31
2017-01-01 | 52 | 2
2017-01-02 | 52 | 49
The above is just a snippet of the dataframe with 1st column containing every single dates from 2015-2019, while second column is the isoweek (using lubridate) that the date belongs to. I am trying to obtain and group the dates in such a way that it will show only the first date and last date of each isoweek with one of them being this:
2016-01-04 | 1 | 9
2016-01-10 | 1 | 20
This is straightforward using dplyr's group_by (year and isoweek) and then filtering by row numbers. But the issue i have here is that some of the final weeks of a year spills over to the following year such as week 52 of 2016. In that instance i am not able to group by year and then week to obtain first and last row since I will have two sets of week 52 at the end of 2016 and at the start of 2017. Is there a way to obtain the first row and last row of each week without the need to look at the year. i.e. filtering for just the first entry and last entry of each week noting here that there are multiple week 1s, 2s 3s and so on from different years. Ideally it should throw out something like this:
date | week | value
2016-01-01 | 53 | 3
2016-01-03 | 53 | 1
2016-01-04 | 1 | 9
2016-01-10 | 1 | 20
.
.
.
2016-12-31 | 52 | 31
2017-01-02 | 52 | 49
2017-01-03 | 1 | 34
2017-01-09 | 1 | 2
Upvotes: 0
Views: 89
Reputation: 389325
We can compare week
with previous value using lag
and take 1st and last value from this new group.
library(dplyr)
df %>%
group_by(grp = cumsum(week != lag(week, default = first(week)))) %>%
slice(1, n())
Upvotes: 0
Reputation: 1688
One approach is making a new id
column for grouping
library(data.table)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:data.table':
#>
#> hour, isoweek, mday, minute, month, quarter, second, wday, week,
#> yday, year
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
dt <- data.table(date = seq(ymd(20160101),ymd(20191231),"days"),value=runif(1461,0,10))
dt[,isoweek:=isoweek(date)]
dt[,id:=paste0(rleid(isoweek),"-",isoweek)]
dt[,.SD[c(1,.N),],by=.(id)]
#> id date value isoweek
#> 1: 1-53 2016-01-01 7.231343 53
#> 2: 1-53 2016-01-03 8.732023 53
#> 3: 2-1 2016-01-04 9.292968 1
#> 4: 2-1 2016-01-10 8.502751 1
#> 5: 3-2 2016-01-11 9.123189 2
#> ---
#> 416: 208-51 2019-12-22 7.619993 51
#> 417: 209-52 2019-12-23 4.538770 52
#> 418: 209-52 2019-12-29 6.318457 52
#> 419: 210-1 2019-12-30 4.570501 1
#> 420: 210-1 2019-12-31 7.968364 1
dt[,.SD[c(1,.N),],by=.(id)][id=="53-52",]
#> id date value isoweek
#> 1: 53-52 2016-12-26 5.570343 52
#> 2: 53-52 2017-01-01 4.577057 52
Created on 2020-04-19 by the reprex package (v0.3.0)
Upvotes: 0