Reputation: 2783
I need to write a function that takes dataframe columns as arguments, and will add new variables to the dataframe if any of these are missing, such as going from
foo bar
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
to:
foo bar arg3
1 1 1 NA
2 2 2 NA
3 3 3 NA
4 4 4 NA
5 5 5 NA
6 6 6 NA
7 7 7 NA
8 8 8 NA
9 9 9 NA
10 10 10 NA
. So far, I have this example:
df <- data.frame(foo = 1:10, bar = 1:10)
CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {
list_args <- list(arg1, arg2, arg3)
# lapply(list_args, function(x) if(is.null(x)) data[[x]] <<- NA)
# lapply(list_args, function(x) if(is.null(x)) data$x <<- NA)
return(data)
}
CheckData(df, arg1 = 'foo', arg2 = 'bar')
So, I pass the function a dataframe with 2 columns, foo & bar, which allows arg3 to be NULL
, the default value. In the 2 commented out lines, we have 2 options -
arg3
in list_args
is NULL
, a new variable data[[arg3]]
should be created & populated with NA
s. However, this doesn't work, and I wonder if perhaps some non-standard evaluation would help here, so that it sees this not as a NULL
object but as a name/string.df$x
, not df$arg3
. I could explicitly do this one by one, such as
CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL) {
if(is.null(arg1)) data$arg1 <- NA
if(is.null(arg2)) data$arg2 <- NA
if(is.null(arg3)) data$arg3 <- NA
return(data)
}
CheckData(df, arg1 = 'foo', arg2 = 'bar')
but this would be inelegant and require prior knowledge of all possible variables, which isn't realistic for my needs.
This seems like it should be a fairly straight-forward problem for advanced R programmers, but I'm blocked and can't find a solution despite some hours of searching and trial and error. Many thanks for any help
Upvotes: 0
Views: 56
Reputation: 2783
I managed to find a solution using some inherent functionality within the plyr
package and using assign
instead of the assignment operator (<-
), which allows me to have the names of the arguments mapped to the variables, as below:
library(magrittr)
dat <- data.frame(foo = 1:10, bar = letters[1:10])
CheckData <- function(data, arg1 = NULL, arg2 = NULL, arg3 = NULL, arg4 = NULL) {
# create dataframe of missing/unmatched arguments
list_args <-
list(arg1, arg2, arg3, arg4) %>%
setNames(c('arg1', 'arg2', 'arg3', 'arg4')) %>%
plyr::ldply(function(x) if(is.null(x)) NA)
# create new variables based on these missing arguments; map the arguments to these variables
for(i in list_args[[1]]) {data[[i]] <- NA; assign(i, i)}
return(data)
}
tmp <- CheckData(dat, arg1 = 'foo', arg2 = 'bar')
which gives the following dataframe, which is exactly what I wanted:
foo bar arg3 arg4
1 1 a NA NA
2 2 b NA NA
3 3 c NA NA
4 4 d NA NA
5 5 e NA NA
6 6 f NA NA
7 7 g NA NA
8 8 h NA NA
9 9 i NA NA
10 10 j NA NA
Upvotes: 0
Reputation: 545508
The following does what you want:
CheckArgs = function (df, ...) {
args = list(...)
for (arg in names(args)) {
if (! arg %in% names(df))
df[[arg]] = args[[arg]]
}
df
}
Alternatively, the following does the same, but without the loop:
CheckArgs = function (df, ...) {
args = list(...)
missing = ! names(args) %in% names(df)
df[names(args)[missing]] = args[missing]
df
}
Usage:
df = CheckArgs(df, a = NA, b = NA, c = NA)
If you only ever want to fill the vector with NA
s, then a better solution would be to have a function that allows you just to specify the required names:
df = CheckArgs(df, c('a', 'b', 'c'))
… or something along these lines. This can of course easily be done in much the same way:
CheckArgs = function (df, required_names) {
missing = ! required_names %in% names(df)
df[required_names[missing]] = NA
df
}
Upvotes: 1