Clayton Glasser
Clayton Glasser

Reputation: 193

Reorder data frame from calendar year to water year using R

I know questions adjacent to this one have been posed before (here and here), but some of the assumptions and goals are different enough that I haven't been able to translate it to this situation. I am also an R novice.

I have a data frame structured like so:

STATION     DATE        PRCP 
USC00352972 1910-01-01  0 
USC00352972 1910-02-01  0   
USC00352972 1910-03-01  0
USC00352972 1910-04-01  0
USC00352972 1910-05-01  0
USC00352972 1910-06-01  0
USC00352972 1910-07-01  0
USC00352972 1910-08-01  0
USC00352972 1910-09-01  0
USC00352972 1910-10-01  0
USC00352972 1910-11-01  0
USC00352972 1910-12-01  0
...         ...         .
US1ORLA0076 2018-01-01  0
US1ORLA0076 2018-02-01  0
US1ORLA0076 2018-03-01  0
US1ORLA0076 2018-04-01  0
US1ORLA0076 2018-05-01  0
US1ORLA0076 2018-06-01  0
US1ORLA0076 2018-07-01  0
US1ORLA0076 2018-08-01  0
US1ORLA0076 2018-09-01  0
US1ORLA0076 2018-10-01  0
US1ORLA0076 2018-11-01  0
US1ORLA0076 2018-12-01  0

The data contains dozens of stations and hundreds of thousands of observations. It is listed alphabetically by station and then ordered by calendar year (Jan-Dec).

I want to rearrange this data set such that it is listed by our "water year" (Oct-Sep). Conceptually, this is as simple as:

For each row (in chronological order) > if row's month is 10-12 > place that row directly above it's station's earliest dated row.

But I doubt that this logic conforms to R vernacular, and I'm unsure how to code it anyways. What is the most conventional way to achieve this result in R? What is the most efficient?

Upvotes: 1

Views: 1064

Answers (2)

AkselA
AkselA

Reputation: 8846

A simple base R approach.
If month is October, November or December, shift year one on.

xd <- as.Date(seq(1, 1500, by=80), origin="1910-01-01")

w.year <- as.numeric(format(xd, "%Y"))
oct.nov.dec <- as.numeric(format(xd, "%m")) > 9
w.year[oct.nov.dec] <- w.year[oct.nov.dec] + 1

data.frame("Calendar_date"=xd, "Water_year"=w.year)

#    Calendar_date Water_year
# 1     1910-01-02       1910
# 2     1910-03-23       1910
# 3     1910-06-11       1910
# 4     1910-08-30       1910
# 5     1910-11-18       1911
# 6     1911-02-06       1911
# 7     1911-04-27       1911
# 8     1911-07-16       1911
# 9     1911-10-04       1912
# 10    1911-12-23       1912
# 11    1912-03-12       1912
# 12    1912-05-31       1912
# 13    1912-08-19       1912
# 14    1912-11-07       1913
# 15    1913-01-26       1913
# 16    1913-04-16       1913
# 17    1913-07-05       1913
# 18    1913-09-23       1913
# 19    1913-12-12       1914

Upvotes: 1

MKR
MKR

Reputation: 20095

One option is to introduce a new column on which data will be arranged. One can subtract 1 year from the date when month is between Oct - Dec so that data for those rows appears with previous years/period.

library(dplyr)
library(lubridate)

df %>% mutate(DATE = ydm(DATE)) %>%
  mutate(WaterPeriod = 
      as.Date(ifelse(month(DATE)>=10, DATE-years(1), DATE),origin = "1970-01-01")) %>%
  arrange(STATION, WaterPeriod) %>%
  select(-WaterPeriod)

Upvotes: 0

Related Questions