Craig Francis
Craig Francis

Reputation: 1935

Specifying the qplot "fill" argument in an r function

I'm just starting to use R, and want to create a generic function where I can specify the "fill" argument for the the qplot() function.

graph_search <- function(group) {
    qplot(time, data=subset, geom="density", fill=path, alpha=I(.5))
}
graph_search("path");
graph_search("code"); # does not work

Ideally I would replace the fill=path with fill=group, in the same way it works with:

data_max <- function(size = 5) {
    print(tail(subset[order(subset$time),], n=size))
}
data_max(10);

Background

I'm using this to look at some WebServer logs, where each record (request) has a time (how long it took to execute in seconds), path (the URL requested without the query string), response code (e.g. 200, 301), the ID of the user logged in, etc.

The subset variable is created with a query such as:

subset <- subset(data, code != 302 & time > 0.2 & path!="/not/this/path/")
subset <- subset(data, code != 302 & grepl("^/admin/", path) & time > 0)
subset <- subset(data, code == 500)

And these work well with:

graph_frequency <- function() {
    # hist(subset$time, xlab="time", col="lightblue", main="Web 1")
    qplot(time, data=subset, geom="density", fill=code, alpha=I(.5))
}
graph_history <- function() {
    # plot(time ~ timestamp, data=subset, type='h', xlab='date', ylab='time')
    plot(subset$timestamp, subset$time, type='h', xlab='date', ylab='time')
}

Extra info

And while this isn't relevant to the question (but feel free to comment on how to improve this), the Apache config uses:

LogFormat "%h %l %u [%{LOG_INFO}n] [%{%Y-%m-%d %H:%M:%S}t] [%D/%{TIME_INFO}n] \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" inc_info

With the non-Apache variables coming from PHP:

if (function_exists('apache_note')) {
    apache_note('LOG_INFO', USER_ID);
    apache_note('TIME_INFO', number_format(round((microtime(true) - FRAMEWORK_START), 4), 4));
}

Where R is started with:

library("stringr")
library("ggplot2")

And the access log is parsed with:

data_load <- function(log_path) {

    data = read.table(log_path, sep=" ")

    data$timestamp = as.POSIXct(strptime(paste(data[,5], data[,6]), '[%Y-%m-%d %H:%M:%S]'))
    data$timings <- str_match(data[,7], "\\[([0-9]*)/(.*)\\]")[,c(2,3)]
    data$info <- str_match(data[,4], "\\[(.*)\\]")[,2]
    data$request <- str_match(data[,8], "([A-Z]+) (/.*) HTTP")[,c(2,3)]

    data = cbind(
        timestamp = data[13],
        apache = data[,14][,1],
        time = data[,14][,2],
        ip = data[,1],
        info = data[,15],
        method = data[,16][,1],
        url = data[,16][,2],
        code = data[,9],
        size = data[,10],
        referrer = data[,11],
        agent = data[,12])

    data$time <- as.numeric(as.character(data$time))
    data$info <- as.numeric(as.character(data$info))
    data$code <- as.character(data$code)
    data$path <- gsub("\\?.*", "", data$url)

    # Drop apache/referrer/agent
    data = data[,-c(2,10,11)]

    # Drop url (optional)
    data = data[,-c(6)]

    return(data)

}

Upvotes: 1

Views: 274

Answers (1)

Craig Francis
Craig Francis

Reputation: 1935

Thanks to @joran, this seems to work:

graph_search <- function(group) {
    ggplot(subset, aes(x = time)) + geom_density(aes_string(fill=group), alpha=I(.5))
}

Which uses ggplot directly, rather than the shorthand qplot (aka "quick plot").

Upvotes: 2

Related Questions