Reputation: 3
I am trying to write a function using aggregate() that will allow me to easily specify one or more variables to list by and their names.
data:
FCST_VAR OBS_SID FCST_INIT_HOUR ME
WIND 00000 12 4.00000
WIND 11111 12 -0.74948
WIND 22222 12 -0.97792
WIND 00000 00 -2.15822
WIND 11111 00 0.94710
WIND 22222 00 -2.28489
I can do this for a single variable to group by fairly easily:
aggregate.CNT <- function(input.data, aggregate.by) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(Station_ID = input.data[[OBS_SID]]),
mean, na.rm=T)
}
However, I'm stumped on two things: Firstly, a way to be able to call the function specifying a name for the 'group by' column (instead of Group1), eg in the case of:
aggregate.CNT <- function(input.data, aggregate.by, group.name) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(group.name = input.data[[OBS_SID]]),
mean, na.rm=T)
}
But this results in the column name in the output being group.name
rather than the desired value of the argument.
Secondly, building on that - if I want to optionally specify more than one variable to sort by - with names. I tried using ...
but that doesn't seem to possibly since the additional arguments obviously need to be in the form:
list(arg1 = input.data[[arg2]], arg3 = input.data[[arg4]])
And I don't think there's a way to place extra arguments into a arg3 = input.data[[arg4]]
format.
So I was wondering if there is a way to use an argument to insert a whole string into the function, eg:
aggregate.CNT <- function(input.data, aggregate.by.list) {
# Calculate mean ME by aggregating specified variable
output.data <- aggregate(input.data$ME,
list(aggregate.by.list),
mean, na.rm=T)
aggregate.CNT(data, "Station_ID = data$OBS_SID, Init_Hour = data$FCST_INIT_HOUR")
If this isn't possible, suggestions for alternative methods are also greatly appreciated.
Thanks
Mal
Upvotes: 0
Views: 128
Reputation: 270338
Try this:
aggregate.CNT <- function(data, by) {
ag <- aggregate(ME ~., data[c("ME", by)], mean, na.rm = TRUE)
if (!is.null(names(by))) names(ag) <- c(names(by), "ME")
ag
}
Here is an example:
> DF <- data.frame(ME = 1:5, g = c(1, 1, 2, 2, 2), b = c(1, 1, 1, 2, 2))
> aggregate.CNT(DF, "g")
g ME
1 1 1.5
2 2 4.0
> aggregate.CNT(DF, c("g", "b"))
g b ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5
> aggregate.CNT(DF, c(G = "g", B = "b"))
G B ME
1 1 1 1.5
2 2 1 3.0
3 2 2 4.5
ADDED: by
vector may be named.
Upvotes: 1