Create variable only if function argument is missing

Question

I need to write a function that takes dataframe columns as arguments, and will add new variables to the dataframe if any of these are missing, such as going from

to:

   foo bar  arg3
1    1   1 NA
2    2   2 NA
3    3   3 NA
4    4   4 NA
5    5   5 NA
6    6   6 NA
7    7   7 NA
8    8   8 NA
9    9   9 NA
10  10  10 NA

. So far, I have this example:

df <- data.frame(foo = 1:10, bar = 1:10)

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {

  list_args <- list(arg1, arg2, arg3)

  # lapply(list_args, function(x) if(is.null(x)) data[[x]] <<- NA)
  # lapply(list_args, function(x) if(is.null(x)) data$x <<- NA)

  return(data)

}

CheckData(df, arg1 = 'foo', arg2 = 'bar')

So, I pass the function a dataframe with 2 columns, foo & bar, which allows arg3 to be NULL, the default value. In the 2 commented out lines, we have 2 options -

the first would ideally see that, since arg3 in list_args is NULL, a new variable data[[arg3]] should be created & populated with NAs. However, this doesn't work, and I wonder if perhaps some non-standard evaluation would help here, so that it sees this not as a NULL object but as a name/string.
the second works in this way, but creates a variable called df$x, not df$arg3.

I could explicitly do this one by one, such as

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {

  if(is.null(arg1)) data$arg1 <- NA
  if(is.null(arg2)) data$arg2 <- NA
  if(is.null(arg3)) data$arg3 <- NA      

  return(data)

}

CheckData(df, arg1 = 'foo', arg2 = 'bar')

but this would be inelegant and require prior knowledge of all possible variables, which isn't realistic for my needs.

This seems like it should be a fairly straight-forward problem for advanced R programmers, but I'm blocked and can't find a solution despite some hours of searching and trial and error. Many thanks for any help

Jonny · Accepted Answer

I managed to find a solution using some inherent functionality within the plyr package and using assign instead of the assignment operator (<-), which allows me to have the names of the arguments mapped to the variables, as below:

library(magrittr)
dat <- data.frame(foo = 1:10, bar = letters[1:10])

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL, arg4 = NULL) {

  # create dataframe of missing/unmatched arguments
  list_args <- 
    list(arg1, arg2, arg3, arg4) %>% 
    setNames(c('arg1', 'arg2', 'arg3', 'arg4')) %>% 
    plyr::ldply(function(x) if(is.null(x)) NA)

  # create new variables based on these missing arguments; map the arguments to these variables
  for(i in list_args[[1]]) {data[[i]] <- NA; assign(i, i)}

  return(data)

}

tmp <- CheckData(dat, arg1 = 'foo', arg2 = 'bar')

which gives the following dataframe, which is exactly what I wanted:

   foo bar arg3 arg4
1    1   a   NA   NA
2    2   b   NA   NA
3    3   c   NA   NA
4    4   d   NA   NA
5    5   e   NA   NA
6    6   f   NA   NA
7    7   g   NA   NA
8    8   h   NA   NA
9    9   i   NA   NA
10  10   j   NA   NA

Create variable only if function argument is missing

Answers (2)

Related Questions