Edward Gargan
Edward Gargan

Reputation: 153

Multi-line Time Series Chart in ggplot2

I have a dataframe comprising two columns, 'host', and 'date'; which describes a series of cyber attacks against a number of different servers on specific dates over a seven month period.

Here's what the data looks like,

> china_atks %>% head(100)
                host       date
1     groucho-oregon 2013-03-03
2     groucho-oregon 2013-03-03
...
46 groucho-singapore 2013-03-03
48 groucho-singapore 2013-03-04
...

Where 'groucho-oregon', 'groucho-signapore', etc., is the hostname of the server targeted by an attack.

There are around 190,000 records, spanning 03/03/2013 to 08/09/2013, e.g.

> unique(china_atks$date)
  [1] "2013-03-03" "2013-03-04" "2013-03-05" "2013-03-06" "2013-03-07" 
"2013-03-08" "2013-03-09"
  [8] "2013-03-10" "2013-03-11" "2013-03-12" "2013-03-13" "2013-03-14" 
"2013-03-15" "2013-03-16"
 [15] "2013-03-17" "2013-03-18" "2013-03-19" "2013-03-20" "2013-03-21" 
"2013-03-22" "2013-03-23"
...

I'd like to create a multi-line time series chart that visualises how many attacks each individual server received each day over the range of dates, but I can't figure out how to pass the data to ggplot to achieve this. There are nine unique hostnames, and so the chart would show nine lines.

Thanks!

Upvotes: 2

Views: 809

Answers (2)

MKR
MKR

Reputation: 20085

ggplot2 library is capable of performing statistics. Hence, an option could be to let ggplot handle count/frequency. This should draw multiple lines (one for each group)

ggplot(df, aes(x=Date, colour = host, group = host)) + geom_line(stat = "count")

Note: Make sure host is converted to factor to have discrete color for lines.

Upvotes: 3

Rana Usman
Rana Usman

Reputation: 1051

Here's one way to do this.

First Summarize the count frequency by date.

library(plyr)
df <- plyr::count(da,c("host", "date"))

Then Do the plotting.

ggplot(data=df, aes(x=date, y=freq, group=1)) + 
  geom_line(aes(color = host))  

Data

 da <- structure(list(host = structure(1:4, .Label = c("groucho-eu", 
    "groucho-oregon", "groucho-singapore", "groucho-tokyo"), class = "factor"), 
        date = structure(c(1L, 1L, 1L, 1L), .Label = "2013-03-03", class = "factor"), 
        freq = c(1L, 4L, 2L, 1L)), .Names = c("host", "date", "freq"
    ), row.names = c(NA, -4L), class = "data.frame")

Upvotes: 3

Related Questions