Reputation: 1935
I'm just starting to use R, and want to create a generic function where I can specify the "fill" argument for the the qplot()
function.
graph_search <- function(group) {
qplot(time, data=subset, geom="density", fill=path, alpha=I(.5))
}
graph_search("path");
graph_search("code"); # does not work
Ideally I would replace the fill=path
with fill=group
, in the same way it works with:
data_max <- function(size = 5) {
print(tail(subset[order(subset$time),], n=size))
}
data_max(10);
I'm using this to look at some WebServer logs, where each record (request) has a time
(how long it took to execute in seconds), path
(the URL requested without the query string), response code
(e.g. 200, 301), the ID of the user
logged in, etc.
The subset
variable is created with a query such as:
subset <- subset(data, code != 302 & time > 0.2 & path!="/not/this/path/")
subset <- subset(data, code != 302 & grepl("^/admin/", path) & time > 0)
subset <- subset(data, code == 500)
And these work well with:
graph_frequency <- function() {
# hist(subset$time, xlab="time", col="lightblue", main="Web 1")
qplot(time, data=subset, geom="density", fill=code, alpha=I(.5))
}
graph_history <- function() {
# plot(time ~ timestamp, data=subset, type='h', xlab='date', ylab='time')
plot(subset$timestamp, subset$time, type='h', xlab='date', ylab='time')
}
And while this isn't relevant to the question (but feel free to comment on how to improve this), the Apache config uses:
LogFormat "%h %l %u [%{LOG_INFO}n] [%{%Y-%m-%d %H:%M:%S}t] [%D/%{TIME_INFO}n] \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" inc_info
With the non-Apache variables coming from PHP:
if (function_exists('apache_note')) {
apache_note('LOG_INFO', USER_ID);
apache_note('TIME_INFO', number_format(round((microtime(true) - FRAMEWORK_START), 4), 4));
}
Where R is started with:
library("stringr")
library("ggplot2")
And the access log is parsed with:
data_load <- function(log_path) {
data = read.table(log_path, sep=" ")
data$timestamp = as.POSIXct(strptime(paste(data[,5], data[,6]), '[%Y-%m-%d %H:%M:%S]'))
data$timings <- str_match(data[,7], "\\[([0-9]*)/(.*)\\]")[,c(2,3)]
data$info <- str_match(data[,4], "\\[(.*)\\]")[,2]
data$request <- str_match(data[,8], "([A-Z]+) (/.*) HTTP")[,c(2,3)]
data = cbind(
timestamp = data[13],
apache = data[,14][,1],
time = data[,14][,2],
ip = data[,1],
info = data[,15],
method = data[,16][,1],
url = data[,16][,2],
code = data[,9],
size = data[,10],
referrer = data[,11],
agent = data[,12])
data$time <- as.numeric(as.character(data$time))
data$info <- as.numeric(as.character(data$info))
data$code <- as.character(data$code)
data$path <- gsub("\\?.*", "", data$url)
# Drop apache/referrer/agent
data = data[,-c(2,10,11)]
# Drop url (optional)
data = data[,-c(6)]
return(data)
}
Upvotes: 1
Views: 274
Reputation: 1935
Thanks to @joran, this seems to work:
graph_search <- function(group) {
ggplot(subset, aes(x = time)) + geom_density(aes_string(fill=group), alpha=I(.5))
}
Which uses ggplot
directly, rather than the shorthand qplot
(aka "quick plot").
Upvotes: 2