NewCommer
NewCommer

Reputation: 37

Aggregate hourly data into monthly data starting with the yyyy-mm-dd h:m format in R

I've been actively looking for a solution to my question in R and did not find anything to solve my problem...

I have an R report to submit for the beginning of January, using pepe memes data. I am studying the price of pepe memes through times, and here comes my problem. I have the dates in the format yyyy-mm-dd h:m, and I want to aggregate those into means of monthly data. I was thinking about first making a new file, with my timestamp in format yyyy-mm but I am not able to this. I was successful when translating into the yyyy-mm-dd format, but I really have an issue when I want to go to the yyyy-mm format.

So, more clearly, here are my two questions :

Here are some rows of my dataset (just an abstract, it contains more than 250 rows):

   Timestamp           ForwardQuantity TotalPriceUSDPerUnit
------------------------------------------------------------
 1 2016-09-26 04:00:00               3                 3.44
 2 2016-09-26 04:00:00               7                 3.44
 3 2016-09-26 05:00:00               3                 3.39
 4 2016-09-26 05:00:00               1                 3.39
 5 2016-09-26 06:00:00               2                 3.39
 6 2016-09-26 13:00:00               4                 2.84
 7 2016-09-28 04:00:00               1                 2.88
 8 2016-09-28 04:00:00               1                 2.92
 9 2016-09-28 06:00:00               1                 2.92
10 2016-09-28 06:00:00               1                 2.92 

Many thanks in advance, and have a nice christmas for those celebrating it!

EDIT : Result expected :

   Timestamp           Average price
 ------------------------------------
 1 2016-09               2.9981 

Here the average price has been obtained by multiplying the forward quantity above with its related price

EDIT 2 : The output of dput(head(DatasHAIRPEPE3col, 10)) is the following

    structure(list(Timestamp = structure(c(1474862400, 1474862400, 
1474866000, 1474866000, 1474869600, 1474894800, 1475035200, 1475035200, 
1475042400, 1475042400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    ForwardQuantity = c(3L, 7L, 3L, 1L, 2L, 4L, 1L, 1L, 1L, 1L
    ), TotalPriceUSDPerUnit = c(3.445, 3.445, 3.392, 3.392, 3.392, 
    2.8352, 2.8795, 2.9238, 2.9238, 2.9238)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 708

Answers (2)

NColl
NColl

Reputation: 757

Using the sample data provided in a previous answer (with an additional month added for demonstration) along with dplyr and anytime

library(tidyverse)
library(anytime)

Lines <- "
Timestamp               ForwardQuantity         UsdPricePerUnit
2016-09-26 04:00:00     3                 3.44
2016-09-26 04:00:00     7                 3.44
2016-09-26 05:00:00     3                 3.39
2016-10-26 05:00:00     1                 3.39
2016-10-26 05:00:00     2                 3.39
2016-10-26 06:00:00     4                 3.39"

DF <- read.csv(textConnection(gsub("  +", ",", Lines)))
DF %>%
  mutate(month = format(anydate((Timestamp)), "%Y-%m")) %>%
  group_by(month) %>%
  mutate(MonthlySpend = ForwardQuantity*UsdPricePerUnit) %>%
  summarise(QuanPerMon = sum(ForwardQuantity),
            SpendPerMon = sum(MonthlySpend)) %>%
  mutate(AveragePrice = SpendPerMon/QuanPerMon) %>%
  select(1,4)

# A tibble: 2 x 2
  month   AveragePrice
  <chr>          <dbl>
1 2016-09         3.43
2 2016-10         3.39

EDIT - New data added to question

This worked for me with your data

df %>%
  mutate(month = format(anydate((Timestamp)), "%Y-%m")) %>%
  group_by(month) %>%
  mutate(MonthlySpend = ForwardQuantity*TotalPriceUSDPerUnit) %>%
  summarise(QuanPerMon = sum(ForwardQuantity),
            SpendPerMon = sum(MonthlySpend)) %>%
  mutate(AveragePrice = SpendPerMon/QuanPerMon) %>%
  select(1,4)

# A tibble: 1 x 2
  month   AveragePrice
  <chr>          <dbl>
1 2016-09         3.24

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269491

Using the data shown reproducibly in the Note at the end

1) zoo convert the data to a zoo object aggregating it at the same time to class yearmon. That will give a zoo object Mean with one mean per year/month. You can either use that or convert it to a data.frame using fortify.zoo. This solution is probably more convenient than (2) below since we directly represent the year/month as a yearmon class object which can be plotted and manipulated in a logical manner.

library(zoo)
Mean <- read.zoo(DF, FUN = as.yearmon, aggregate = mean)
fortify.zoo(Mean)  # optional

giving this data frame:

     Index     Mean
1 Sep 2016 3.406667

You could now further manipulate, e.g. plot it using plot.zoo like this:

plot(Mean)

2) Base R Alternately, use the first 7 characters of each timestamp to represent the year/month and aggregate by that.

DF2 <- transform(DF, Timestamp = substring(Timestamp, 1, 7))
aggregate(UsdPricePerUnit ~ Timestamp, DF2, mean)

giving:

  Timestamp UsdPricePerUnit
1   2016-09        3.406667

Note

Lines <- "
Timestamp                    UsdPricePerUnit
2016-09-26 04:00:00                 3.44
2016-09-26 04:00:00                 3.44
2016-09-26 05:00:00                 3.39
2016-09-26 05:00:00                 3.39
2016-09-26 05:00:00                 3.39
2016-09-26 06:00:00                 3.39"
DF <- read.csv(textConnection(gsub("  +", ",", Lines)))

Upvotes: 1

Related Questions