Pelle
Pelle

Reputation: 257

Plot monthly figures out of 10min data

I'd like generate monthly plots out of a 10 Minutes time series. The beginning and end of the time series is different for every data set, so it should work generally. Additionally, the plots should be generated for different variables, too.

I have had a pretty ugly solution with a loop over the years and another one over the months, which works but also produces some empty extra plots. I hope the code makes it more understandable.

library(dplyr)
library(readr)
library(tidyverse)
library(ggplot2)
library(lubridate)

#test data:

TDF <- tibble(DATE = seq( make_datetime(2007,09,23,06,00), make_datetime(2008,07,05,23,00), by = 600),
              V1 = round(runif(length(DATE)),2),
              V2 = round(runif(length(DATE)),2),
              V3 = round(runif(length(DATE)),2)
)


for (year in min( year( TDF$DATE)) : max( year( TDF$DATE))) {
  for (mon in min( month( TDF$DATE)) : max( month( TDF$DATE))) {
    for (var in c( "V1", "V2", "V3")) {
      filename <- paste0("Abb/", var, "_", year, "-", mon, "_ZR.png")
      png(filename, width = 1800, height = 900, res = 200)
      p <- ggplot( TDF[ year(TDF$DATE) == year & month(TDF$DATE) == mon,])
      p <- p + geom_line( aes_string( "DATE", paste0(var)))
      print(p)
      graphics.off()
    }
  }
}

So, there must be a better way. I'm now struggling with this (same test data):

yearmonmin <- TDF$DATE %>% min() %>% floor_date(unit = "month") 
yearmonmax <- TDF$DATE %>% max() %>% ceiling_date(unit = "month")

seq(yearmonmin, yearmonmax, by = "month")

for (yearmon in seq(yearmonmin, yearmonmax, by = "month")) {
  print(var)
}

This is really confusing me because

> seq(yearmonmin, yearmonmax, by = "month")
 [1] "2007-09-01 UTC" "2007-10-01 UTC" "2007-11-01 UTC" "2007-12-01 UTC" "2008-01-01 UTC" "2008-02-01 UTC" "2008-03-01 UTC" "2008-04-01 UTC"
 [9] "2008-05-01 UTC" "2008-06-01 UTC" "2008-07-01 UTC" "2008-08-01 UTC"    

BUT

> for (yearmon in seq(yearmonmin, yearmonmax, by = "month")) {
+   print(yearmon)
+ }
[1] 1188604800
[1] 1191196800
[1] 1193875200
[1] 1196467200
[1] 1199145600
[1] 1201824000
[1] 1204329600
[1] 1207008000
[1] 1209600000
[1] 1212278400
[1] 1214870400
[1] 1217548800

I've already tried seq.Date and two days of other alternatives not worth to be shown here...

I heard it's best to avoid loops in R. So... anybody?

Upvotes: 1

Views: 59

Answers (1)

eipi10
eipi10

Reputation: 93861

We use melt to reshape the data from wide to long, so we can operate on V1, V2 and V3 as a single column. Then we create month groups. I've done all of this using the dplyr chaining operator (%>%).

Now that we have the data in the form we want, we use lapply to create a time series plot for each of the original value columns for each month. The split function splits the data frame into a separate data frame for each month so that we can create separate plots for each month of data. This combination of lapply and split avoids explicit loops.

library(lubridate)
library(ggplot2)
library(reshape2)
library(dplyr)

# Reshape to long and add month grouping
TDF = TDF %>% melt(id.var="DATE") %>%
  arrange(DATE) %>%
  mutate(month = paste0(month(DATE, label=TRUE, abbr=TRUE)," ", year(DATE)),
         month = factor(month, levels=unique(month)))

# Create a list of plots by month
pl = lapply(split(TDF, TDF$month), function(df) {
  ggplot(df, aes(DATE, value)) +
    geom_line(aes(group=variable)) +
    facet_grid(. ~ variable) +
  theme(axis.text.x = element_text(angle=-90, hjust=0, vjust=0.5))
})

You now have a list where each list element contains a plot for one month of data. For example:

pl[["Sep 2007"]] 

enter image description here

You can save these plots to individual files, or you can lay them out on a single page and save that. Or, if you save as a PDF, you can create a multi-page PDF with a single plot on each page.

If you want V1, V2, and V3 in separate plots, you can do something similar to the code above, but with a slight change to the split function to split by both month and variable:

pl = lapply(split(TDF, paste(TDF$variable, TDF$month)), function(df) {
  ggplot(df, aes(DATE, value)) +
    geom_line(aes(group=variable)) +
    facet_grid(. ~ variable) +
    theme(axis.text.x = element_text(angle=-90, hjust=0, vjust=0.5))
})

Now each element of the list is a single plot for each variable for each month:

pl[["V1 Apr 2008"]]

enter image description here

Upvotes: 1

Related Questions