dojiny yushin
dojiny yushin

Reputation: 1

DST calculation using R

I want to calculate the daylight saving time beginning date for each year from 2003 through 2021 and keep only the days that are 60 days before and after the daylight saving time begin date each year.

i.e date will change each year (falls on a Sunday) and moved from happening in April 2003-2006 to happening in March during the years 2007-2021.

I need to Create a running variable “days” that measures the distance from the daylight saving time begin date for each year with days=0 on the first day of daylight saving time.

Here's dataset

year month day propertycrimes violentcrimes

2003 1 1 94 34

2004 1 1 60 46

2005 1 1 106 41

2006 1 1 87 40

2007 1 1 72 36

2008 1 1 43 50

2009 1 1 35 32

2010 1 1 32 50

2011 1 1 29 45

2012 1 1 32 45

Here's my code so far

library(readr)
dailycrimedataRD <- read_csv("dailycrimedataRD.csv")
View(dailycrimedataRD)
days <- .POSIXct(month, tz="GMT")

Upvotes: 0

Views: 62

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21992

How about this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(readr)
dailycrimedataRD <- read_csv("~/Downloads/dailycrimedataRD.csv")
#> Rows: 6940 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (5): year, month, day, propertycrimes, violentcrimes
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

tmp <-  dailycrimedataRD  %>% 
  mutate(date = lubridate::ymd(paste(year, month, day, sep="-"), tz='Canada/Eastern'), 
         dst = lubridate::dst(date)) %>% 
  arrange(date) %>% 
  group_by(year) %>% 
  mutate(dst_date = date[which(dst == TRUE & lag(dst) == FALSE)], 
         diff = (as.Date(dst_date) - as.Date(date))) %>% 
  filter(diff <= 60 & diff >= 0)

tmp
#> # A tibble: 1,159 × 9
#> # Groups:   year [19]
#>     year month   day propertycrimes violentcrimes date                dst  
#>    <dbl> <dbl> <dbl>          <dbl>         <dbl> <dttm>              <lgl>
#>  1  2003     2     6             68             8 2003-02-06 00:00:00 FALSE
#>  2  2003     2     7             71             8 2003-02-07 00:00:00 FALSE
#>  3  2003     2     8             81            12 2003-02-08 00:00:00 FALSE
#>  4  2003     2     9             68             7 2003-02-09 00:00:00 FALSE
#>  5  2003     2    10             68             9 2003-02-10 00:00:00 FALSE
#>  6  2003     2    11             61             8 2003-02-11 00:00:00 FALSE
#>  7  2003     2    12             73            10 2003-02-12 00:00:00 FALSE
#>  8  2003     2    13             62            14 2003-02-13 00:00:00 FALSE
#>  9  2003     2    14             71            10 2003-02-14 00:00:00 FALSE
#> 10  2003     2    15             90            11 2003-02-15 00:00:00 FALSE
#> # … with 1,149 more rows, and 2 more variables: dst_date <dttm>, diff <drtn>

Created on 2022-04-14 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions