Create data.frame variables from list

Question

Is it possible to create and assign a name to an object "by reference"? For example, I have a large data.frame and I need to do some basic operations to some of the columns in it. I put the columns, grouping and operations I need to do in lists:

exec_group_list = c("nbhd", "state", "use")
exec_var_list   = c("land", "imp", "assmt", "landp", "impp", "assmtp")
exec_func_list  = c("sum", "mean", "median", "max", "min", "sd")

So, the "land" column, will be grouped by "nbhd" and then the "sum", "mean", "median", etc will be applied to it. Then the same will be done to the "imp" column and so on. Then I will repeat the same but this time the grouping will be done by "state"... rinse, lathe and repeat, as follows:

for (eachg in exec_group_list){
  group_by_field = eachg
  group_by = eval(parse(text=paste("sales$",group_by_field)))
  group_by_lst = list(group_by)
  print(paste("Grouping by:", eachg))
  #CREATE DATA.FRAME FOR GROUP HERE
    for (eachv in exec_var_list){
      var = eval(parse(text=paste("sales$",eachv)))
      print(paste("On column:", eachv))
      for (eachf in exec_func_list){
    print(paste("Calculating:", eachf))
    tempt = (aggregate(var, group_by_lst, eachf))
    colnames(tempt) = c(eachg, paste(eachv,".",eachf, sep=""))
    print(tempt)
    #APPEND COLUMNS TO GROUP DATA.FRAME
      }
    }
  }

I figured out how to use references from a list using eval() so I can loop thru the grouping list and the column list and do the same operations using the values in the list.

But I'd like to store the info in a data.frame named after the grouping field. So for example, if I am grouping by "nbhd" I'd like to create an empty data.frame named "by_nbhd".

I tried something similar to eval(parse(text=paste("by_","nbhd", sep=""))) = data.frame("nbhd"=NA) but I get an error.

Anyone knows if this is possible? Any help will be appreciated. Thank you in advance.

IRTFM · Accepted Answer

Rather than asking for "creating an object by reference" which brings up all sorts of extraneous cognitive associations with the distinction between "calling by value" versus "calling by reference", you should be asking for help on "computing on/with the language". Presumably you have a dataset (which you have not described very well) with a set of columns named" "nbhd","state", and "use", and also columns named: "land", "imp", "assmt", "landp", "impp", "assmtp". You want to serial examine summary statistics of 6 sorts within 6 categories of the first group on the numeric columns of the second group (3 x 6 x 6 tables).

Write a prototype of a function that delivers one summary table for a particular function, a particular numeric column, and a particular categorical column.

 tabfn <- function(dfrm, numcol, catcol, fn){
                         tapply(dfrm[[numcol]], dfrm[[catcol]], fn) }

It's easiest to create a list of first class functions rather than eval(parsing(text=character-objects)

exec_func_list  = list(sum, mean, median, max, min, sd)
for (eachg in exec_group_list){
  print(paste("Grouping by:", eachg))
  for (eachv in exec_var_list){
     print(paste("On column:", eachv))
     for (eachfn in exec_func_list){
       print(paste("Calculating:", eachf))
       print(tabfn(dfrm, exec_var_list, exec_group_list, eachfn)
                              }
                               }
                                   }

Unfortunately this is mostly untested guesswork since you have not produces a minimal reproducible example.

Create data.frame variables from list

Answers (1)

Related Questions