Reputation: 7156
I have CSV data of a log for 24 hours that looks like this:
svr01,07:17:14,'[email protected]','8.3.1.35'
svr03,07:17:21,'[email protected]','82.15.1.35'
svr02,07:17:30,'[email protected]','2.15.1.35'
svr04,07:17:40,'[email protected]','2.1.1.35'
I read the data with tbl <- read.csv("logs.csv")
How can I plot this data in a histogram to see the number of hits per hour? Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.
Thank you for helping me here!
Upvotes: 3
Views: 11010
Reputation: 5497
I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.
a) Converting your data into hours:
#df being the dataframe
df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
df$hours <- as.numeric(format(df$timestamp, format="%H"))
hist(df$hours)
This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:
b) Making a histogram with ggplot2
#install.packages("ggplot2")
require(ggplot2)
ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) + facet_wrap(~ server)
# or use a color instead
ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)
c) You could also use another package:
require(plotrix)
l <- split(df$hours, f=df$server)
multhist(l)
The examples are given below. The third makes comparison easier but ggplot2
simply looks better I think.
EDIT
Here is how thes solutions would look like
first solution:
second solution:
third solution:
Upvotes: 9
Reputation: 60944
An example dataset:
dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
time = Sys.time() + sort(round(runif(1000, 1, 36000))))
The trick I use is to create a new variable which only specifies in which hour the hit was recorded:
dat$hr = strftime(dat$time, "%H")
Now we can use some plyr
magick:
hits_hour = count(dat, vars = c("server","hr"))
And create the plot:
ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")
Which looks like:
I don't really like this plot, I'd be more in favor of:
ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)
Which looks like:
Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.
Upvotes: 8