Reputation: 1778
This is embarrassing: I admit several pieces of code of mine have a similar structure to this:
Arghhhh! Do real programmers cringe when they see something like this?
The figure should be self-explanatory: I need a variable called Season that returns an ID "(year-1)_(year)" based on the WEATHERDATE column.
So, any row with dates from '1998-06-15' to '1999-06-14' should return "1998-1999" under the Season ID Column.
The WEATHERDATE column simply runs from 1998-01-01 to TODAY.
I bet there's a one/two liner solution to this. I've tried the zoo package but no success.
Any R wizard to point me to the obvious solution I'm missing here?
***REPRODUCIBLE EXAMPLE:
WEATHERDATE <- seq(as.Date("1998-01-01"), len=99999, by=1)
VARIABLE <- rnorm(n = length(WEATHERDATE))
data_mex <- data.frame(WEATHERDATE, VARIABLE)
## how to create SEASON based on dates??
# I would then run the code block from above, something like:
data_mex <- within(data_mex, Season[DATEWEATHER >= ymd(StartOfSeason)+365*0 & DATEWEATHER < ymd(StartOfSeason)+365*1 ] <- "1998-1999")
Of course a solution where I don't have to come back and touch the code every year would be ideal :)
Upvotes: 1
Views: 181
Reputation: 8458
You can use cut.Date
and seq.Date
to do this in two lines in base R:
seasons <- format(as.Date(cut.Date(as.Date(data_mex$WEATHERDATE),
breaks=seq.Date(as.Date("1997-06-15"),
as.Date("2280-06-15", "year"))), "%Y")
data_mex$seasons <- paste0(seasons, "-", as.numeric(seasons) + 1)
Note I have used "2280-06-15"
as the latest date for your seasons, but you may want to replace that with Sys.Date()
, or whenever is appropriate for your task.
This returns:
> head(data_mex)
WEATHERDATE VARIABLE seasons
1 1998-01-01 -0.2260734 1997-1998
2 1998-01-02 0.3222805 1997-1998
3 1998-01-03 -0.1554167 1997-1998
4 1998-01-04 -0.5591154 1997-1998
5 1998-01-05 1.0729737 1997-1998
6 1998-01-06 1.0030025 1997-1998
> tail(data_mex)
WEATHERDATE VARIABLE seasons
99994 2271-10-10 0.59986466 2271-2272
99995 2271-10-11 0.37304603 2271-2272
99996 2271-10-12 1.30822156 2271-2272
99997 2271-10-13 0.01204986 2271-2272
99998 2271-10-14 0.87340544 2271-2272
99999 2271-10-15 0.44098083 2271-2272
Upvotes: 6
Reputation: 10483
Here is one solution you can use to get the desired output:
data_mex$seasonId <-
with(data_mex,
ifelse(as.numeric(format(WEATHERDATE, '%m')) >= 6 &
as.numeric(format(WEATHERDATE, '%d')) >= 15,
paste(format(WEATHERDATE, '%Y'),
as.numeric(format(WEATHERDATE, '%Y')) + 1, sep = '-'),
paste(as.numeric(format(WEATHERDATE, '%Y')) - 1,
format(WEATHERDATE, '%Y'), sep = '-')))
Resulting output is:
head(data_mex)
WEATHERDATE VARIABLE seasonId
1 1998-01-01 -0.02541724 1997-1998
2 1998-01-02 -0.48725913 1997-1998
3 1998-01-03 0.06204881 1997-1998
4 1998-01-04 -0.15207281 1997-1998
5 1998-01-05 -0.61809248 1997-1998
6 1998-01-06 -1.20427734 1997-1998
tail(data_mex)
WEATHERDATE VARIABLE seasonId
99994 2271-10-10 -0.8976144 2271-2272
99995 2271-10-11 -0.5684972 2271-2272
99996 2271-10-12 -0.8847031 2271-2272
99997 2271-10-13 -0.2659979 2271-2272
99998 2271-10-14 -0.9108352 2271-2272
99999 2271-10-15 1.1601581 2271-2272
You extract the year portion each time, and paste the lagging or leading year based on the month and day value. format
method is allowing you extract specific portions of the date.
Upvotes: 1