Reputation: 153
I have a dataframe comprising two columns, 'host', and 'date'; which describes a series of cyber attacks against a number of different servers on specific dates over a seven month period.
Here's what the data looks like,
> china_atks %>% head(100)
host date
1 groucho-oregon 2013-03-03
2 groucho-oregon 2013-03-03
...
46 groucho-singapore 2013-03-03
48 groucho-singapore 2013-03-04
...
Where 'groucho-oregon', 'groucho-signapore', etc., is the hostname of the server targeted by an attack.
There are around 190,000 records, spanning 03/03/2013 to 08/09/2013, e.g.
> unique(china_atks$date)
[1] "2013-03-03" "2013-03-04" "2013-03-05" "2013-03-06" "2013-03-07"
"2013-03-08" "2013-03-09"
[8] "2013-03-10" "2013-03-11" "2013-03-12" "2013-03-13" "2013-03-14"
"2013-03-15" "2013-03-16"
[15] "2013-03-17" "2013-03-18" "2013-03-19" "2013-03-20" "2013-03-21"
"2013-03-22" "2013-03-23"
...
I'd like to create a multi-line time series chart that visualises how many attacks each individual server received each day over the range of dates, but I can't figure out how to pass the data to ggplot to achieve this. There are nine unique hostnames, and so the chart would show nine lines.
Thanks!
Upvotes: 2
Views: 809
Reputation: 20085
ggplot2
library is capable of performing statistics. Hence, an option could be to let ggplot
handle count/frequency. This should draw multiple lines (one for each group)
ggplot(df, aes(x=Date, colour = host, group = host)) + geom_line(stat = "count")
Note: Make sure host
is converted to factor
to have discrete color for lines.
Upvotes: 3
Reputation: 1051
Here's one way to do this.
First Summarize the count frequency by date.
library(plyr)
df <- plyr::count(da,c("host", "date"))
Then Do the plotting.
ggplot(data=df, aes(x=date, y=freq, group=1)) +
geom_line(aes(color = host))
Data
da <- structure(list(host = structure(1:4, .Label = c("groucho-eu",
"groucho-oregon", "groucho-singapore", "groucho-tokyo"), class = "factor"),
date = structure(c(1L, 1L, 1L, 1L), .Label = "2013-03-03", class = "factor"),
freq = c(1L, 4L, 2L, 1L)), .Names = c("host", "date", "freq"
), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 3