Reputation: 163
I have a variable that provides miscellaneous dates. I want to summarize these so they can be factored before being used in a predictive model.
I would like to do group the dates by the following:
I'm pretty new to R so any help on this would be much appreciated. Thank you
Upvotes: 0
Views: 49
Reputation: 866
As other commenters have noted, you haven't supplied any data or a reproducible example, but let's give this a go anyway.
I'll be using two tidyverse packages, dplyr
and lubridate
, to help us out.
For present purposes, let's start by generating some random dates and put these into a dataframe/tibble. I'm assuming your dates are already within a dataframe in the right class, as Gregor pointed out above.
data <- tibble(date = sample(seq(as.Date('2015-01-01'), as.Date('2020-12-31'), by="day"), 50))
Let's now use dplyr
and lubridate
to recode the dates into a new variable, date_group
:
data %>%
mutate(date_group = factor(
case_when(
year(date) == year(today()) ~ "This Year",
year(date) == year(today()) - 1 ~ "Last Year",
year(date) < today() - years(3) ~ "Over 3 Years Ago",
TRUE ~ "Other"
)
))
For the first two groups, we apply use the lubridate
function year()
(which extracts the year from a date) to the date
column in data
, and compare this against the year extracted from today's date (using today()
).
For dates over 3 years ago, we subtract 3 years from today's date (noting that this is different from the calendar-year based calculations for this year and last year) using years()
.
Of course, this leaves a gap for dates less than 3 years ago but more than 1 calendar year ago. We have a default option in the case_when
function to specify this as "Other".
We wrap the result of the case_when
function in factor()
so that the resulting groups are treated as a factor rather than a string ready for subsequent modelling.
The case_when
function is useful (and easy to read) if you have just a few categories. Too many and it gets too messy and you should think about another way to restructure your data.
Upvotes: 1