Reputation: 33740
I have a log file with dates and sizes (of files). I would like to plot the bandwidth used per 1 minute and per 5 minutes. Input looks like this:
2014-08-08 06:37:34.610 639205638
2014-08-08 06:37:37.110 239205638
2014-08-08 06:38:58.810 635899318
2014-08-08 06:38:21.877 1420094614
2014-08-08 06:40:11.772 140034211
So I need to bin the values by date into 1 minute and 5 minutes bins, sum each bin, average them by the number of minites, and plot them against the time.
But I have a feeling this has been done before and that I can use a generic plotting function.
Upvotes: 0
Views: 56
Reputation: 270195
Its not clear what "average them by number of minutes" means but ignoring that, this bins the data by 1 minute and 5 minutes and plots the bins. Note that we have specified that the data is "numeric"
to avoid integer overflow. Omit facet = NULL
if you want them shown in separate panels:
library(zoo)
library(ggplot2)
library(scales)
# read data from character variable Lines; Lines shown after graph
z <- read.zoo(text = Lines, index = 1:2, tz = "",
colClasses = c(NA, NA, "numeric"))
ag1 <- aggregate(z, as.POSIXct(cut(time(z), "min")), sum)
ag5 <- aggregate(z, as.POSIXct(cut(time(z), "5 min")), sum)
autoplot(na.approx(cbind(ag1, ag5)), facet = NULL) +
scale_x_datetime(breaks = "1 min", labels = date_format("%H:%M"))
Here is `Lines` :
Lines <- "2014-08-08 06:37:34.610 639205638
2014-08-08 06:37:37.110 239205638
2014-08-08 06:38:58.810 635899318
2014-08-08 06:38:21.877 1420094614
2014-08-08 06:45:11.772 140034211"
Upvotes: 0
Reputation: 49820
You can do this easily with xts.
# read in the data
x <- read.table(text="2014-08-08 06:37:34.610 639205638
2014-08-08 06:37:37.110 239205638
2014-08-08 06:38:58.810 635899318
2014-08-08 06:38:21.877 1420094614
2014-08-08 06:40:11.772 140034211", stringsAsFactors=FALSE)
# convert to xts
xx <- xts(x[, 3], as.POSIXct(paste(x[,1], x[, 2])))
# find the 1 minute and 5 minute endpoints
ep1 <- endpoints(xx, "minutes", 1)
ep5 <- endpoints(xx, "minutes", 5)
period.sum(xx, ep1) # 1 minute sums
period.sum(xx, ep5) # 5 minute sums
More general (but slower):
period.apply(xx, ep1, sum)
For the last part of your Question, just take the mean of these results
mean(period.sum(xx, ep1))
#[1] 1024813140
Upvotes: 1