Learning_R
Learning_R

Reputation: 659

How do I group my date variable into month/year in R?

I have a "date" vector, that contains dates in mm/dd/yyyy format:

 head(Entered_Date,5)
[1] 1/5/1998 1/5/1998 1/5/1998 1/5/1998 1/5/1998

I am trying to plot a frequency variable against the date, but I want to group the dates that it is by month or year. As it is now, there is a frequency per day, but I want to plot the frequency by month or year. So instead of having a frequency of 1 for 1/5/1998, 1 for 1/7/1998, and 3 for 1/8/1998, I would like to display it as 5 for 1/1998. It is a relatively large data set, with dates from 1998 to present, and I would like to find some automated way to accomplish this.

> dput(head(Entered_Date))
structure(c(260L, 260L, 260L, 260L, 260L, 260L), .Label = c("1/1/1998", 
"1/1/1999", "1/1/2001", "1/1/2002", "1/10/2000", "1/10/2001", 
"1/10/2002", "1/10/2003", "1/10/2005", "1/10/2006", "1/10/2007", 
"1/10/2008", "1/10/2011", "1/10/2012", "1/10/2013", "1/11/1999", 
"1/11/2000", "1/11/2001", "1/11/2002", "1/11/2005", "1/11/2006", 
"1/11/2008", "1/11/2010", "1/11/2011", "1/11/2012", "1/11/2013", 
"1/12/1998", "1/12/1999", "1/12/2001", "1/12/2004", "1/12/2005", ...

Upvotes: 55

Views: 128661

Answers (8)

rm1104
rm1104

Reputation: 341

There is also group_by(month_yr = cut(date, breaks = "1 month") in base R, without needing to use lubridate or other packages.

Upvotes: 0

Maël
Maël

Reputation: 52329

Another solution is slider::slide_period:

library(slider)
library(dplyr)

monthly_summary <- function(data) summarise(data, date = format(max(date), "%Y-%m"), value = sum(value))
slide_period_dfr(df, df$date, "month", monthly_summary)

      date value
1  1998-01    92
2  1998-02    82
3  1998-03   113
4  1998-04    94
5  1998-05    92
6  1998-06    74
7  1998-07    89
8  1998-08    92
9  1998-09    91
10 1998-10   100
...

Upvotes: 0

Hugh Sawbridge
Hugh Sawbridge

Reputation: 851

The floor_date() function from the lubridate package does this nicely.

data %>% 
    group_by(month = lubridate::floor_date(date, "month")) %>%
    summarize(summary_variable = sum(value))

Thanks to Roman Cheplyaka https://ro-che.info/articles/2017-02-22-group_by_month_r

See more on how to use the function: https://lubridate.tidyverse.org/reference/round_date.html

Upvotes: 85

Ben G
Ben G

Reputation: 4338

Just to add to @cdeterman answer, you can use lubridate along with dplyr to make this even easier:

df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
                 value = sample(seq(5), 731, replace = TRUE))

library(dplyr)
library(lubridate)

df %>%
mutate(month = month(date), year = year(date)) %>%
group_by(month, year) %>%
summarise(total = sum(value))

Upvotes: 7

user7453767
user7453767

Reputation: 339

There is a super easy way using the cut() function:

    list = as.Date(c("1998-5-2", "1993-4-16", "1998-5-10"))
    cut(list, breaks = "month")

and you will get this:

    [1] 1998-05-01 1993-04-01 1998-05-01
    62 Levels: 1993-04-01 1993-05-01 1993-06-01 1993-07-01 1993-08-01 ... 1998-05-01

Upvotes: 2

CCurtis
CCurtis

Reputation: 1932

Dont need dplyr. Look at ?as.POSIXlt

df$date<-as.POSIXlt(df$date)
mon<-df$date$mon
yr<-df$date$year
monyr<-as.factor(paste(mon,yr,sep="/"))
df$date<-monyr

Don't need to use ggplot2 but its nice for this kind of thing.

c <- ggplot(df, aes(factor(date)))
c + geom_bar()

If you want to see the actual numbers

aggregate(. ~ date,data = df,FUN=length )
df2<-aggregate(. ~ date,data = df,FUN=length )
df2
    date value
1   0/98    31
2   0/99    31
3   1/98    28
4   1/99    28
5  10/98    30
6  10/99    30
7  11/97     1
8  11/98    31
9  11/99    31
10  2/98    31
11  2/99    31
12  3/98    30
13  3/99    30
14  4/98    31
15  4/99    31
16  5/98    30
17  5/99    30
18  6/98    31
19  6/99    31
20  7/98    31
21  7/99    31
22  8/98    30
23  8/99    30
24  9/98    31
25  9/99    31

Upvotes: 2

cdeterman
cdeterman

Reputation: 19970

Here is an example using dplyr. You simply use the corresponding date format string for month %m or year %Y in the format statement.

set.seed(123)
df <- data.frame(date = seq.Date(from =as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to=as.Date("01/01/2000", "%d/%m/%Y"), by="day"),
                 value = sample(seq(5), 731, replace = TRUE))

head(df)
        date value
1 1998-01-01     2
2 1998-01-02     4
3 1998-01-03     3
4 1998-01-04     5
5 1998-01-05     5
6 1998-01-06     1

library(dplyr)

df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>%
group_by(month, year) %>%
summarise(total = sum(value))

Source: local data frame [25 x 3]
Groups: month [?]

   month  year total
   (chr) (chr) (int)
1     01  1998   105
2     01  1999    91
3     01  2000     3
4     02  1998    74
5     02  1999    77
6     03  1998    96
7     03  1999    86
8     04  1998    91
9     04  1999    95
10    05  1998    93
..   ...   ...   ...

Upvotes: 40

J. Win.
J. Win.

Reputation: 6771

Maybe you just add a column in your data like this:

Year <- format(as.Date(Entered_Date, "%d/%m/%Y"), "%Y")

Upvotes: 2

Related Questions