Reputation: 1
I want to calculate the daylight saving time beginning date for each year from 2003 through 2021 and keep only the days that are 60 days before and after the daylight saving time begin date each year.
i.e date will change each year (falls on a Sunday) and moved from happening in April 2003-2006 to happening in March during the years 2007-2021.
I need to Create a running variable “days” that measures the distance from the daylight saving time begin date for each year with days=0 on the first day of daylight saving time.
year month day propertycrimes violentcrimes
2003 1 1 94 34
2004 1 1 60 46
2005 1 1 106 41
2006 1 1 87 40
2007 1 1 72 36
2008 1 1 43 50
2009 1 1 35 32
2010 1 1 32 50
2011 1 1 29 45
2012 1 1 32 45
Here's my code so far
library(readr)
dailycrimedataRD <- read_csv("dailycrimedataRD.csv")
View(dailycrimedataRD)
days <- .POSIXct(month, tz="GMT")
Upvotes: 0
Views: 62
Reputation: 21992
How about this:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(readr)
dailycrimedataRD <- read_csv("~/Downloads/dailycrimedataRD.csv")
#> Rows: 6940 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (5): year, month, day, propertycrimes, violentcrimes
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tmp <- dailycrimedataRD %>%
mutate(date = lubridate::ymd(paste(year, month, day, sep="-"), tz='Canada/Eastern'),
dst = lubridate::dst(date)) %>%
arrange(date) %>%
group_by(year) %>%
mutate(dst_date = date[which(dst == TRUE & lag(dst) == FALSE)],
diff = (as.Date(dst_date) - as.Date(date))) %>%
filter(diff <= 60 & diff >= 0)
tmp
#> # A tibble: 1,159 × 9
#> # Groups: year [19]
#> year month day propertycrimes violentcrimes date dst
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm> <lgl>
#> 1 2003 2 6 68 8 2003-02-06 00:00:00 FALSE
#> 2 2003 2 7 71 8 2003-02-07 00:00:00 FALSE
#> 3 2003 2 8 81 12 2003-02-08 00:00:00 FALSE
#> 4 2003 2 9 68 7 2003-02-09 00:00:00 FALSE
#> 5 2003 2 10 68 9 2003-02-10 00:00:00 FALSE
#> 6 2003 2 11 61 8 2003-02-11 00:00:00 FALSE
#> 7 2003 2 12 73 10 2003-02-12 00:00:00 FALSE
#> 8 2003 2 13 62 14 2003-02-13 00:00:00 FALSE
#> 9 2003 2 14 71 10 2003-02-14 00:00:00 FALSE
#> 10 2003 2 15 90 11 2003-02-15 00:00:00 FALSE
#> # … with 1,149 more rows, and 2 more variables: dst_date <dttm>, diff <drtn>
Created on 2022-04-14 by the reprex package (v2.0.1)
Upvotes: 0