Reputation: 541
I have a data frame with 70+ columns. I need to perform some repetitive computations with a number of columns using each column separately.
Based on @Ananda's approach and feedback, here is the reworded simplistic example and solution. I am still keeping the old thread at the end for the sake of discussion thread,
Problem: Calculate sum of various columns of a data frame using a function where column names are specified as multiple arguments:
> df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)
> myFunc(df, aa, bb, cc)
aa series sum is 55
bb series sum is 1055
cc series sum is 2055
> myFunc(df, aa, dd)
aa series sum is 55
dd series sum is 3055
> myFunc(df, dd)
dd series sum is 3055
>
And myFunc function definition to accomplish this is below
myFunc = function(data, ...){
argList = match.call(expand.dots=FALSE)$...
for(i in 1:length(argList)){
colName = argList[[i]]
series_colName = eval(substitute(colName), envir=data, enclos=parent.frame())
cat(colName, "series sum is", sum(series_colName), "\n")
}
}
This gives me a starting point to work with. If there is a better way to define myFunc, please let me know.
Thanks for all the help
::::Old Discussion Thread:
I am still figuring my ways in R, hence bear with me please. The following sample code simulates my first try and it bombed on me. Where am I going wrong and what will be the R-ish way to do this type of computation. Please help
myFunc = function(data, y, ...){
argList = list(...)
argList
#for each arg in argList
#do some processing with data, y and column arg
}
df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)
myFunc(df, aa, bb)
myFunc(df, aa, bb, cc)
And the error message is
Error in myFunc(df, aa, bb) : object 'bb' not found
Error in myFunc(df, aa, bb, cc) : object 'bb' not found
Adding further so that it becomes more clear.
myFunc(df, aa, c(2,4, 6))
works fine.
I intend to use eval, substitute and envir in further processing to extract the values of various columns, hence I would like to pass the column names in a natural way rather than as character strings. I hope that I am able to communicate my intention clearly.
Upvotes: 2
Views: 1791
Reputation: 193507
I got this somewhere (most likely SO): Use match.call
as follows...
myFunc <- function(data, ...) {
argList <- as.character(match.call(expand.dots=FALSE)$...)
argList
}
myFunc(df, aa, bb)
# [1] "aa" "bb"
myFunc(df, aa, bb, cc)
# [1] "aa" "bb" "cc"
Your followups in the comments are very unclear, so I'll try to explain with an example.
In the below, I've added a "y" argument to the function and for the sake of demonstration, let's just return the relevant values in a list.
myFunc <- function(data, y, ...) {
argList <- as.character(match.call(expand.dots=FALSE)$...)
list(y, argList)
}
If we don't specify the "y =
" part when using the function, R assumes that the second value should be used for "y
" and all other values should be used for "...
".
myFunc(df, aa, bb)
# Error in myFunc(df, aa, bb) : object 'aa' not found
myFunc(df, y = NULL, aa, bb)
# [[1]]
# NULL
#
# [[2]]
# [1] "aa" "bb"
You were not getting any error because your version of the function made no reference to "y
".
Upvotes: 1
Reputation: 2244
Because aa, bb and cc do not exist. It needs to know that they exist in df:
myFunc(df, df$aa, df$bb)
myFunc(df, df$aa, df$bb, df$cc)
Upvotes: 0