Reputation: 193
I know questions adjacent to this one have been posed before (here and here), but some of the assumptions and goals are different enough that I haven't been able to translate it to this situation. I am also an R novice.
I have a data frame structured like so:
STATION DATE PRCP
USC00352972 1910-01-01 0
USC00352972 1910-02-01 0
USC00352972 1910-03-01 0
USC00352972 1910-04-01 0
USC00352972 1910-05-01 0
USC00352972 1910-06-01 0
USC00352972 1910-07-01 0
USC00352972 1910-08-01 0
USC00352972 1910-09-01 0
USC00352972 1910-10-01 0
USC00352972 1910-11-01 0
USC00352972 1910-12-01 0
... ... .
US1ORLA0076 2018-01-01 0
US1ORLA0076 2018-02-01 0
US1ORLA0076 2018-03-01 0
US1ORLA0076 2018-04-01 0
US1ORLA0076 2018-05-01 0
US1ORLA0076 2018-06-01 0
US1ORLA0076 2018-07-01 0
US1ORLA0076 2018-08-01 0
US1ORLA0076 2018-09-01 0
US1ORLA0076 2018-10-01 0
US1ORLA0076 2018-11-01 0
US1ORLA0076 2018-12-01 0
The data contains dozens of stations and hundreds of thousands of observations. It is listed alphabetically by station and then ordered by calendar year (Jan-Dec).
I want to rearrange this data set such that it is listed by our "water year" (Oct-Sep). Conceptually, this is as simple as:
For each row (in chronological order) > if row's month is 10-12 > place that row directly above it's station's earliest dated row.
But I doubt that this logic conforms to R vernacular, and I'm unsure how to code it anyways. What is the most conventional way to achieve this result in R? What is the most efficient?
Upvotes: 1
Views: 1064
Reputation: 8846
A simple base R approach.
If month is October, November or December, shift year one on.
xd <- as.Date(seq(1, 1500, by=80), origin="1910-01-01")
w.year <- as.numeric(format(xd, "%Y"))
oct.nov.dec <- as.numeric(format(xd, "%m")) > 9
w.year[oct.nov.dec] <- w.year[oct.nov.dec] + 1
data.frame("Calendar_date"=xd, "Water_year"=w.year)
# Calendar_date Water_year
# 1 1910-01-02 1910
# 2 1910-03-23 1910
# 3 1910-06-11 1910
# 4 1910-08-30 1910
# 5 1910-11-18 1911
# 6 1911-02-06 1911
# 7 1911-04-27 1911
# 8 1911-07-16 1911
# 9 1911-10-04 1912
# 10 1911-12-23 1912
# 11 1912-03-12 1912
# 12 1912-05-31 1912
# 13 1912-08-19 1912
# 14 1912-11-07 1913
# 15 1913-01-26 1913
# 16 1913-04-16 1913
# 17 1913-07-05 1913
# 18 1913-09-23 1913
# 19 1913-12-12 1914
Upvotes: 1
Reputation: 20095
One option is to introduce a new column on which data will be arranged. One can subtract 1 year
from the date when month is between Oct - Dec
so that data for those rows appears with previous years/period.
library(dplyr)
library(lubridate)
df %>% mutate(DATE = ydm(DATE)) %>%
mutate(WaterPeriod =
as.Date(ifelse(month(DATE)>=10, DATE-years(1), DATE),origin = "1970-01-01")) %>%
arrange(STATION, WaterPeriod) %>%
select(-WaterPeriod)
Upvotes: 0