Reputation: 257
I'd like generate monthly plots out of a 10 Minutes time series. The beginning and end of the time series is different for every data set, so it should work generally. Additionally, the plots should be generated for different variables, too.
I have had a pretty ugly solution with a loop over the years and another one over the months, which works but also produces some empty extra plots. I hope the code makes it more understandable.
library(dplyr)
library(readr)
library(tidyverse)
library(ggplot2)
library(lubridate)
#test data:
TDF <- tibble(DATE = seq( make_datetime(2007,09,23,06,00), make_datetime(2008,07,05,23,00), by = 600),
V1 = round(runif(length(DATE)),2),
V2 = round(runif(length(DATE)),2),
V3 = round(runif(length(DATE)),2)
)
for (year in min( year( TDF$DATE)) : max( year( TDF$DATE))) {
for (mon in min( month( TDF$DATE)) : max( month( TDF$DATE))) {
for (var in c( "V1", "V2", "V3")) {
filename <- paste0("Abb/", var, "_", year, "-", mon, "_ZR.png")
png(filename, width = 1800, height = 900, res = 200)
p <- ggplot( TDF[ year(TDF$DATE) == year & month(TDF$DATE) == mon,])
p <- p + geom_line( aes_string( "DATE", paste0(var)))
print(p)
graphics.off()
}
}
}
So, there must be a better way. I'm now struggling with this (same test data):
yearmonmin <- TDF$DATE %>% min() %>% floor_date(unit = "month")
yearmonmax <- TDF$DATE %>% max() %>% ceiling_date(unit = "month")
seq(yearmonmin, yearmonmax, by = "month")
for (yearmon in seq(yearmonmin, yearmonmax, by = "month")) {
print(var)
}
This is really confusing me because
> seq(yearmonmin, yearmonmax, by = "month")
[1] "2007-09-01 UTC" "2007-10-01 UTC" "2007-11-01 UTC" "2007-12-01 UTC" "2008-01-01 UTC" "2008-02-01 UTC" "2008-03-01 UTC" "2008-04-01 UTC"
[9] "2008-05-01 UTC" "2008-06-01 UTC" "2008-07-01 UTC" "2008-08-01 UTC"
BUT
> for (yearmon in seq(yearmonmin, yearmonmax, by = "month")) {
+ print(yearmon)
+ }
[1] 1188604800
[1] 1191196800
[1] 1193875200
[1] 1196467200
[1] 1199145600
[1] 1201824000
[1] 1204329600
[1] 1207008000
[1] 1209600000
[1] 1212278400
[1] 1214870400
[1] 1217548800
I've already tried seq.Date
and two days of other alternatives not worth to be shown here...
I heard it's best to avoid loops in R. So... anybody?
Upvotes: 1
Views: 59
Reputation: 93861
We use melt
to reshape the data from wide to long, so we can operate on V1
, V2
and V3
as a single column. Then we create month groups. I've done all of this using the dplyr
chaining operator (%>%
).
Now that we have the data in the form we want, we use lapply
to create a time series plot for each of the original value columns for each month. The split
function splits the data frame into a separate data frame for each month so that we can create separate plots for each month of data. This combination of lapply
and split
avoids explicit loops.
library(lubridate)
library(ggplot2)
library(reshape2)
library(dplyr)
# Reshape to long and add month grouping
TDF = TDF %>% melt(id.var="DATE") %>%
arrange(DATE) %>%
mutate(month = paste0(month(DATE, label=TRUE, abbr=TRUE)," ", year(DATE)),
month = factor(month, levels=unique(month)))
# Create a list of plots by month
pl = lapply(split(TDF, TDF$month), function(df) {
ggplot(df, aes(DATE, value)) +
geom_line(aes(group=variable)) +
facet_grid(. ~ variable) +
theme(axis.text.x = element_text(angle=-90, hjust=0, vjust=0.5))
})
You now have a list where each list element contains a plot for one month of data. For example:
pl[["Sep 2007"]]
You can save these plots to individual files, or you can lay them out on a single page and save that. Or, if you save as a PDF, you can create a multi-page PDF with a single plot on each page.
If you want V1
, V2
, and V3
in separate plots, you can do something similar to the code above, but with a slight change to the split
function to split by both month
and variable
:
pl = lapply(split(TDF, paste(TDF$variable, TDF$month)), function(df) {
ggplot(df, aes(DATE, value)) +
geom_line(aes(group=variable)) +
facet_grid(. ~ variable) +
theme(axis.text.x = element_text(angle=-90, hjust=0, vjust=0.5))
})
Now each element of the list is a single plot for each variable for each month:
pl[["V1 Apr 2008"]]
Upvotes: 1