Malcoholic
Malcoholic

Reputation: 3

Inserting function argument as string within body of function

I am trying to write a function using aggregate() that will allow me to easily specify one or more variables to list by and their names.

data:

   FCST_VAR OBS_SID FCST_INIT_HOUR       ME
     WIND   00000             12    4.00000
     WIND   11111             12   -0.74948
     WIND   22222             12   -0.97792
     WIND   00000             00   -2.15822
     WIND   11111             00    0.94710
     WIND   22222             00   -2.28489

I can do this for a single variable to group by fairly easily:

aggregate.CNT <- function(input.data, aggregate.by) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(Station_ID = input.data[[OBS_SID]]),          
                mean, na.rm=T)
  }

However, I'm stumped on two things: Firstly, a way to be able to call the function specifying a name for the 'group by' column (instead of Group1), eg in the case of:

aggregate.CNT <- function(input.data, aggregate.by, group.name) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(group.name = input.data[[OBS_SID]]),          
                mean, na.rm=T)
}

But this results in the column name in the output being group.name rather than the desired value of the argument.

Secondly, building on that - if I want to optionally specify more than one variable to sort by - with names. I tried using ... but that doesn't seem to possibly since the additional arguments obviously need to be in the form:

list(arg1 = input.data[[arg2]], arg3 = input.data[[arg4]])

And I don't think there's a way to place extra arguments into a arg3 = input.data[[arg4]] format. So I was wondering if there is a way to use an argument to insert a whole string into the function, eg:

aggregate.CNT <- function(input.data, aggregate.by.list) {

  # Calculate mean ME by aggregating specified variable
  output.data <- aggregate(input.data$ME,
                list(aggregate.by.list),          
                mean, na.rm=T)

aggregate.CNT(data, "Station_ID = data$OBS_SID, Init_Hour = data$FCST_INIT_HOUR")

If this isn't possible, suggestions for alternative methods are also greatly appreciated.

Thanks

Mal

Upvotes: 0

Views: 128

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270338

Try this:

aggregate.CNT <- function(data, by) {
    ag <- aggregate(ME ~., data[c("ME", by)], mean, na.rm = TRUE)
    if (!is.null(names(by))) names(ag) <- c(names(by), "ME")
    ag
}

Here is an example:

> DF <- data.frame(ME = 1:5, g = c(1, 1, 2, 2, 2), b = c(1, 1, 1, 2, 2))
> aggregate.CNT(DF, "g")
  g  ME
1 1 1.5
2 2 4.0
> aggregate.CNT(DF, c("g", "b"))
  g b  ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5
> aggregate.CNT(DF, c(G = "g", B = "b"))
  G B  ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5

ADDED: by vector may be named.

Upvotes: 1

Related Questions