poseid
poseid

Reputation: 7156

How to create histogram in R with CSV time data?

I have CSV data of a log for 24 hours that looks like this:

svr01,07:17:14,'[email protected]','8.3.1.35'
svr03,07:17:21,'[email protected]','82.15.1.35'
svr02,07:17:30,'[email protected]','2.15.1.35'
svr04,07:17:40,'[email protected]','2.1.1.35'

I read the data with tbl <- read.csv("logs.csv")

How can I plot this data in a histogram to see the number of hits per hour? Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.

Thank you for helping me here!

Upvotes: 3

Views: 11010

Answers (2)

Seb
Seb

Reputation: 5497

I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.

a) Converting your data into hours:

  #df being the dataframe
  df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
  df$hours <-  as.numeric(format(df$timestamp, format="%H"))
  hist(df$hours)

This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:

b) Making a histogram with ggplot2

 #install.packages("ggplot2")
  require(ggplot2)
  ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) +  facet_wrap(~ server)
  # or use a color instead
  ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)

c) You could also use another package:

 require(plotrix)
 l <- split(df$hours, f=df$server)
 multhist(l)

The examples are given below. The third makes comparison easier but ggplot2 simply looks better I think.

EDIT

Here is how thes solutions would look like

first solution: enter image description here

second solution: enter image description here

third solution: enter image description here

Upvotes: 9

Paul Hiemstra
Paul Hiemstra

Reputation: 60944

An example dataset:

dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
                 time = Sys.time() + sort(round(runif(1000, 1, 36000))))

The trick I use is to create a new variable which only specifies in which hour the hit was recorded:

dat$hr = strftime(dat$time, "%H")

Now we can use some plyr magick:

hits_hour = count(dat, vars = c("server","hr"))

And create the plot:

ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")

Which looks like:

enter image description here

I don't really like this plot, I'd be more in favor of:

ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)

Which looks like:

enter image description here

Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.

Upvotes: 8

Related Questions