kishore
kishore

Reputation: 541

Passing multiple column names of a data frame in a R function

I have a data frame with 70+ columns. I need to perform some repetitive computations with a number of columns using each column separately.

Based on @Ananda's approach and feedback, here is the reworded simplistic example and solution. I am still keeping the old thread at the end for the sake of discussion thread,

Problem: Calculate sum of various columns of a data frame using a function where column names are specified as multiple arguments:

> df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)

> myFunc(df, aa, bb, cc)
aa series sum is 55 
bb series sum is 1055 
cc series sum is 2055 

> myFunc(df, aa, dd)
aa series sum is 55 
dd series sum is 3055 

> myFunc(df, dd)
dd series sum is 3055 
> 

And myFunc function definition to accomplish this is below

myFunc = function(data, ...){
  argList = match.call(expand.dots=FALSE)$...

  for(i in 1:length(argList)){
    colName = argList[[i]]
    series_colName = eval(substitute(colName), envir=data, enclos=parent.frame())
    cat(colName, "series sum is", sum(series_colName), "\n")
  }
}

This gives me a starting point to work with. If there is a better way to define myFunc, please let me know.

Thanks for all the help

::::Old Discussion Thread:

I am still figuring my ways in R, hence bear with me please. The following sample code simulates my first try and it bombed on me. Where am I going wrong and what will be the R-ish way to do this type of computation. Please help

myFunc = function(data, y, ...){
  argList = list(...)
  argList
  #for each arg in argList
    #do some processing with data, y and column arg
}

df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)
myFunc(df, aa, bb)
myFunc(df, aa, bb, cc)

And the error message is

Error in myFunc(df, aa, bb) : object 'bb' not found

Error in myFunc(df, aa, bb, cc) : object 'bb' not found

Adding further so that it becomes more clear.

myFunc(df, aa, c(2,4, 6))

works fine.

I intend to use eval, substitute and envir in further processing to extract the values of various columns, hence I would like to pass the column names in a natural way rather than as character strings. I hope that I am able to communicate my intention clearly.

Upvotes: 2

Views: 1791

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

I got this somewhere (most likely SO): Use match.call as follows...

myFunc <- function(data, ...) {
  argList <- as.character(match.call(expand.dots=FALSE)$...) 
  argList
}

myFunc(df, aa, bb)
# [1] "aa" "bb"
myFunc(df, aa, bb, cc)
# [1] "aa" "bb" "cc"

Your followups in the comments are very unclear, so I'll try to explain with an example.

In the below, I've added a "y" argument to the function and for the sake of demonstration, let's just return the relevant values in a list.

myFunc <- function(data, y, ...) {
  argList <- as.character(match.call(expand.dots=FALSE)$...) 
  list(y, argList)
}

If we don't specify the "y =" part when using the function, R assumes that the second value should be used for "y" and all other values should be used for "...".

myFunc(df, aa, bb)
# Error in myFunc(df, aa, bb) : object 'aa' not found
myFunc(df, y = NULL, aa, bb)
# [[1]]
# NULL
# 
# [[2]]
# [1] "aa" "bb"

You were not getting any error because your version of the function made no reference to "y".

Upvotes: 1

Christie Haskell Marsh
Christie Haskell Marsh

Reputation: 2244

Because aa, bb and cc do not exist. It needs to know that they exist in df:

myFunc(df, df$aa, df$bb)

myFunc(df, df$aa, df$bb, df$cc)

Upvotes: 0

Related Questions