Jonny
Jonny

Reputation: 2783

Create variable only if function argument is missing

I need to write a function that takes dataframe columns as arguments, and will add new variables to the dataframe if any of these are missing, such as going from

    foo bar
1    1   1
2    2   2
3    3   3
4    4   4
5    5   5
6    6   6
7    7   7
8    8   8
9    9   9
10  10  10

to:

   foo bar  arg3
1    1   1 NA
2    2   2 NA
3    3   3 NA
4    4   4 NA
5    5   5 NA
6    6   6 NA
7    7   7 NA
8    8   8 NA
9    9   9 NA
10  10  10 NA

. So far, I have this example:

df <- data.frame(foo = 1:10, bar = 1:10)

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {

  list_args <- list(arg1, arg2, arg3)

  # lapply(list_args, function(x) if(is.null(x)) data[[x]] <<- NA)
  # lapply(list_args, function(x) if(is.null(x)) data$x <<- NA)

  return(data)

}

CheckData(df, arg1 = 'foo', arg2 = 'bar')

So, I pass the function a dataframe with 2 columns, foo & bar, which allows arg3 to be NULL, the default value. In the 2 commented out lines, we have 2 options -

  1. the first would ideally see that, since arg3 in list_args is NULL, a new variable data[[arg3]] should be created & populated with NAs. However, this doesn't work, and I wonder if perhaps some non-standard evaluation would help here, so that it sees this not as a NULL object but as a name/string.
  2. the second works in this way, but creates a variable called df$x, not df$arg3.

I could explicitly do this one by one, such as

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {

  if(is.null(arg1)) data$arg1 <- NA
  if(is.null(arg2)) data$arg2 <- NA
  if(is.null(arg3)) data$arg3 <- NA      

  return(data)

}

CheckData(df, arg1 = 'foo', arg2 = 'bar')

but this would be inelegant and require prior knowledge of all possible variables, which isn't realistic for my needs.

This seems like it should be a fairly straight-forward problem for advanced R programmers, but I'm blocked and can't find a solution despite some hours of searching and trial and error. Many thanks for any help

Upvotes: 0

Views: 56

Answers (2)

Jonny
Jonny

Reputation: 2783

I managed to find a solution using some inherent functionality within the plyr package and using assign instead of the assignment operator (<-), which allows me to have the names of the arguments mapped to the variables, as below:

library(magrittr)
dat <- data.frame(foo = 1:10, bar = letters[1:10])

CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL, arg4 = NULL) {

  # create dataframe of missing/unmatched arguments
  list_args <- 
    list(arg1, arg2, arg3, arg4) %>% 
    setNames(c('arg1', 'arg2', 'arg3', 'arg4')) %>% 
    plyr::ldply(function(x) if(is.null(x)) NA)

  # create new variables based on these missing arguments; map the arguments to these variables
  for(i in list_args[[1]]) {data[[i]] <- NA; assign(i, i)}

  return(data)

}

tmp <- CheckData(dat, arg1 = 'foo', arg2 = 'bar')

which gives the following dataframe, which is exactly what I wanted:

   foo bar arg3 arg4
1    1   a   NA   NA
2    2   b   NA   NA
3    3   c   NA   NA
4    4   d   NA   NA
5    5   e   NA   NA
6    6   f   NA   NA
7    7   g   NA   NA
8    8   h   NA   NA
9    9   i   NA   NA
10  10   j   NA   NA

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 545508

The following does what you want:

CheckArgs = function (df, ...) {
    args = list(...)
    for (arg in names(args)) {
        if (! arg %in% names(df))
            df[[arg]] = args[[arg]]
    }

    df
}

Alternatively, the following does the same, but without the loop:

CheckArgs = function (df, ...) {
    args = list(...)
    missing = ! names(args) %in% names(df)
    df[names(args)[missing]] = args[missing]
    df
}

Usage:

df = CheckArgs(df, a = NA, b = NA, c = NA)

If you only ever want to fill the vector with NAs, then a better solution would be to have a function that allows you just to specify the required names:

df = CheckArgs(df, c('a', 'b', 'c'))

… or something along these lines. This can of course easily be done in much the same way:

CheckArgs = function (df, required_names) {
    missing = ! required_names %in% names(df)
    df[required_names[missing]] = NA
    df
}

Upvotes: 1

Related Questions