Marcus Kazmierczak
Marcus Kazmierczak

Reputation: 216

How to graph requests per second from web log file using R

I'm trying to graph request per second using our apache log files. I've massaged the log down to a simple listing of the timestamps, one entry per request.

04:02:28
04:02:28
04:02:28
04:02:29
...

I can't quite figure out how to get R to recognize as time and aggregate to per second. Thanks for any help

Upvotes: 2

Views: 1638

Answers (3)

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

It seems to me that since you already have time-stamps at one-second granularity, all you need to do is do a frequency-count of the time-stamps and plot the frequencies in the original time-order. Say timeStamps is your array of time-stamps, then you would do:

plot(c( table( timeStamps ) ) )

I'm assuming you want to plot the log-messages in each one-second interval over a certain period. Also I'm assuming that the HMS time-stamps are within one day. Note that the table function produces a frequency-count of its argument.

Upvotes: 1

Timo
Timo

Reputation: 5390

I'm not exactly sure, how to make this correctly, but this should be one possible way and maybe helps you.

  1. Instead of strings, get the data as UNIX timestamps from the database that denote the number of seconds from 1970-01-01.

  2. Use hist(data) to plot a histogram for example. Or you may use melt command from reshape2 package and use cast for creating a data frame, where one column is the timestamp and another column determines the number of transactions at that time.

  3. Use as.POSIXlt(your.unix.timestamps, origin="1970-01-01", tz="GMT") to convert the timestamps to R understandable datetime structures.

  4. Then add labels to the plot using the data from point 3 using format.

Here's an example:

# original data
data.timestamps = c(1297977452, 1297977452, 1297977453, 1297977454, 1297977454, 1297977454, 1297977455, 1297977455)
data.unique.timestamps = unique(data.timestamps)

# get the labels
data.labels = format(as.POSIXlt(data.unique.timestamps, origin="1970-01-01", tz="GMT"), "%H:%M:%S")

# plot the histogram without axes
hist(data.timestamps, axes=F)

# add axes manually
axis(2)
axis(1, at=unique(data.timestamps), labels=data.labels)

-- Hope this helps

Upvotes: 1

Andrie
Andrie

Reputation: 179518

The lubridate package makes working with dates and time very easy.

Here is an example, using the hms() function of lubridate. hms converts a character string into a data frame with separate columns for hours, minutes and seconds. There are similar functions for myd (month-day-year), dmy (day-month-year), ms (minutes-seconds)... you get the point.

library(lubridate)
data <- c("04:02:28", "04:02:28", "04:02:28", "04:02:29")
times <- hms(data)
times$second

[1] 28 28 28 29

At this point, times is a straight-forward data frame, and you can isolate any column you wish:

str(times)

Classes 'period' and 'data.frame':  4 obs. of  6 variables:
 $ year  : num  0 0 0 0
 $ month : num  0 0 0 0
 $ day   : num  0 0 0 0
 $ hour  : num  4 4 4 4
 $ minute: num  2 2 2 2
 $ second: num  28 28 28 29

Upvotes: 3

Related Questions