user7729135
user7729135

Reputation: 409

Grouping 1st and last date of a particular week

I've a dataframe containing dates from 2015-2019. It looks something like this:

date       | week | value
2016-01-01 | 53   | 3
2016-01-02 | 53   | 7
2016-01-03 | 53   | 1
2016-01-04 | 1    | 9
2016-01-05 | 1    | 4
2016-01-06 | 1    | 8
2016-01-07 | 1    | 4
2016-01-08 | 1    | 5
2016-01-09 | 1    | 6
2016-01-10 | 1    | 20
.
.
.
2016-12-31 | 52   | 31
2017-01-01 | 52   | 2
2017-01-02 | 52   | 49

The above is just a snippet of the dataframe with 1st column containing every single dates from 2015-2019, while second column is the isoweek (using lubridate) that the date belongs to. I am trying to obtain and group the dates in such a way that it will show only the first date and last date of each isoweek with one of them being this:

2016-01-04 | 1    | 9
2016-01-10 | 1    | 20

This is straightforward using dplyr's group_by (year and isoweek) and then filtering by row numbers. But the issue i have here is that some of the final weeks of a year spills over to the following year such as week 52 of 2016. In that instance i am not able to group by year and then week to obtain first and last row since I will have two sets of week 52 at the end of 2016 and at the start of 2017. Is there a way to obtain the first row and last row of each week without the need to look at the year. i.e. filtering for just the first entry and last entry of each week noting here that there are multiple week 1s, 2s 3s and so on from different years. Ideally it should throw out something like this:

date       | week | value
2016-01-01 | 53   | 3
2016-01-03 | 53   | 1
2016-01-04 | 1    | 9
2016-01-10 | 1    | 20
.
.
.
2016-12-31 | 52   | 31
2017-01-02 | 52   | 49
2017-01-03 | 1    | 34
2017-01-09 | 1    | 2

Upvotes: 0

Views: 89

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389325

We can compare week with previous value using lag and take 1st and last value from this new group.

library(dplyr)

df %>%
  group_by(grp = cumsum(week != lag(week, default = first(week)))) %>%
  slice(1, n())

Upvotes: 0

Frank Zhang
Frank Zhang

Reputation: 1688

One approach is making a new id column for grouping

library(data.table)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:data.table':
#> 
#>     hour, isoweek, mday, minute, month, quarter, second, wday, week,
#>     yday, year
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

dt <- data.table(date = seq(ymd(20160101),ymd(20191231),"days"),value=runif(1461,0,10))


dt[,isoweek:=isoweek(date)]

dt[,id:=paste0(rleid(isoweek),"-",isoweek)]

dt[,.SD[c(1,.N),],by=.(id)]
#>          id       date    value isoweek
#>   1:   1-53 2016-01-01 7.231343      53
#>   2:   1-53 2016-01-03 8.732023      53
#>   3:    2-1 2016-01-04 9.292968       1
#>   4:    2-1 2016-01-10 8.502751       1
#>   5:    3-2 2016-01-11 9.123189       2
#>  ---                                   
#> 416: 208-51 2019-12-22 7.619993      51
#> 417: 209-52 2019-12-23 4.538770      52
#> 418: 209-52 2019-12-29 6.318457      52
#> 419:  210-1 2019-12-30 4.570501       1
#> 420:  210-1 2019-12-31 7.968364       1

dt[,.SD[c(1,.N),],by=.(id)][id=="53-52",]
#>       id       date    value isoweek
#> 1: 53-52 2016-12-26 5.570343      52
#> 2: 53-52 2017-01-01 4.577057      52

Created on 2020-04-19 by the reprex package (v0.3.0)

Upvotes: 0

Related Questions