Jay J
Jay J

Reputation: 155

How to filter by dates and grouping months together in R using dplyr

I have a dataframe (lets call it df1) that looks something like this...

Date          Price
2014-08-06       22
2014-08-06       89
2014-09-15       56
2014-06-04       41
2015-01-19       11
2015-05-23        5
2014-07-21      108

There are other variables in the dataframe but we will ignore them for now, as I do not require them.

I have previously ordered it using

df2 <- df1[order(as.Date(df1$Date, format="%Y/%m/%d")),]

And then created a dataframe containing the values for just one month, for example, just September 2015 dates...

september2015 <- df2[df2$Date >= "2015-09-01" & df2$Date <= "2015-09-30",]

I have done this for all the months in 2015 and 2014. Then I need to create an average of prices within each given month. I have done this by...

mean(september2015$Price, na.rm = TRUE)

Obviously, this is very long and tedious and involves many lines of code. I am trying to make my code more efficient through using the dplyr package.

So far I have...

datesandprices <- select(df2, Date, Price)

datesandprices <- arrange(datesandprices, Date)

summarise(datesandprices, avg = mean(Price, na.rm = TRUE))

Or in a simpler form...

df1 %>%
    select(Date, Price) %>%
    arrange(Date) %>%
    filter(Date >= 2014-08-06 & Date =< 2014-08-30)
    summarise(mean(Price, na.rm = TRUE))

The filter line is not working for me and I can't figure out how to filter by dates using this method. I would like to get the mean for each month without having to calculate it one by one - and ideally extract the monthly means into a new dataframe or column that looks like...

Month         Average
Jan 2014         x
Feb 2014         y
...
Nov 2015         z
Dec 2015         a

I hope this makes sense. I can't find anything on stackoverflow that works with dates, attempting to do something similar to this (unless I am searching for the wrong functions). Many thanks!

Upvotes: 4

Views: 10547

Answers (5)

Julien
Julien

Reputation: 1696

Convert your column to a Date object and use format

df <- data.frame(
  Date = c("2014-08-06", "2014-08-06", "2014-09-15", "2014-06-04", "2015-01-19", "2015-05-23", "2014-07-21"), 
  Price = c(22, 89, 56, 41, 11, 5, 108))

library(dplyr)

df %>%
  group_by(Month_Year = as.Date(Date) %>% format("%b %Y")) %>%
  summarise(avg = mean(Price, na.rm = TRUE))
# A tibble: 6 x 2
  Month_Year   avg
  <chr>      <dbl>
1 août 2014   55.5
2 janv. 2015  11  
3 juil. 2014 108  
4 juin 2014   41  
5 mai 2015     5  
6 sept. 2014  56 

Upvotes: 0

Chabo
Chabo

Reputation: 3000

This should mean your price data by month-year.

library(zoo)

#Pull out columns
Price<-df1["Price"]
Date<-df1["Date"]

#Put in Zoo
zooPrice <- zoo(Price,Date)

#Monthly mean with year (vector)
monthly.avg <- apply.monthly(zooPrice, mean)

#function to change back to DF
zooToDf <- function(z) {
    df <- as.data.frame(z) 
    df$Date <- time(z) #create a Date column
    rownames(df) <- NULL #so row names not filled with dates
    df <- df[,c(ncol(df), 1:(ncol(df)-1))] #reorder columns so Date first
    return(df)
}

#Apply function to create new Df with data!
MonthYearAvg<-zooToDf(monthly.avg)

Upvotes: 0

Uwe
Uwe

Reputation: 42544

For the sake of completeness, here is also a data.table solution:

library(data.table)

# in case  Date is of type character
setDT(df1)[, .(Average = mean(Price, na.rm = TRUE)), keyby = .(Yr.Mon = substr(Date, 1,7))]

# in case Date is of class Date or POSIXct
setDT(df2)[, .(Average = mean(Price, na.rm = TRUE)), keyby = .(Yr.Mon = format(Date, "%Y-%m"))]
    Yr.Mon Average
1: 2014-06    41.0
2: 2014-07   108.0
3: 2014-08    55.5
4: 2014-09    56.0
5: 2015-01    11.0
6: 2015-05     5.0

Note that the grouping variable Yr.Mon is created "on-the-fly" in the keyby clause.

Data

library(data.table)
df1 <- fread(
  "Date          Price
2014-08-06       22
2014-08-06       89
2014-09-15       56
2014-06-04       41
2015-01-19       11
2015-05-23        5
2014-07-21      108")
df2 <- df1[, Date := as.Date(Date)]

Upvotes: 1

Jay J
Jay J

Reputation: 155

I managed to do it using all dplyr functions, with help from @user108636

df %>%
    select(Date, Price) %>%
    arrange(Date) %>%
    mutate(Month_Year = substr(Date, 1,7)) %>%
    group_by(Month_Year) %>%
    summarise(mean(Price, na.rm = TRUE))

The select function selects the date and price columns. The arrange function arranges my dataframe according to the date - with the earliest date first. The mutate function adds another column which excludes the day and leaves us with, for example...

Month_Year
2015-10
2015-10
2015-11
2015-12
2015-12

The group by function groups all the months together and the summarise function calculates the mean of the price of each month.

Upvotes: 1

Katie
Katie

Reputation: 362

I made a separate column in your data set that contains only year and month. Then, I did a group_by on that column to get the means for each month.

Date <- c("2014-08-06", "2014-08-06", "2014-09-15", "2014-06-04", "2015-01-19", "2015-05-23", "2014-07-21")
Price <- c(22,89,56,41,11,5,108)

Date <- as.Date(Date, format="%Y-%m-%d")

df <- data.frame(Date, Price)
df$Month_Year <- substr(df$Date, 1,7)
library(dplyr)

df %>%
  #select(Date, Price) %>%
  group_by(Month_Year) %>%
  summarise(mean(Price, na.rm = TRUE))

Upvotes: 3

Related Questions