muadhib
muadhib

Reputation: 13

Using lapply to generate new variables across data sets, conditional on not existing

Let's say I have three data sets:

df1 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3))
df2 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3))
df3 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3), var3 = c(1,2,3))

I would like to check to see if a variable "var3" exists within each dataset. If it doesn't, I would like to generate an empty variable called "var3". Here is what I am trying:

dframes <- list(df1,df2,df3)

lapply(dframes, function(df) { 
   ifelse("var3" %in% colnames(df), print("var3 exists"), df$var3 <- NA)
})

The output comes out as:

[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] "var3 exists"

And the desired "var3" variable isn't generated for the first two data sets - they still only contain "var1" and "var2".

You're help is appreciated.

Upvotes: 1

Views: 1430

Answers (1)

mlegge
mlegge

Reputation: 6913

Just putting what everyone has said into a full answer:

df1 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3))
df2 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3))
df3 <- data.frame(var1 = c(1,2,3), var2 = c(1,2,3), var3 = c(1,2,3))

dframes <- list(df1,df2,df3)

dfframes_fmt <- lapply(dframes, function(df) { 
  if(! "var3" %in% colnames(df)) {
    df$var3 <- NA
  }
  df
})

> dfframes_fmt
[[1]]
  var1 var2 var3
1    1    1   NA
2    2    2   NA
3    3    3   NA

[[2]]
  var1 var2 var3
1    1    1   NA
2    2    2   NA
3    3    3   NA

[[3]]
  var1 var2 var3
1    1    1    1
2    2    2    2
3    3    3    3

In order to update to the original names, you can do this:

dfnames <- c("df1", "df2", "df3")
# assemble the list of data frames
dframes <- eval(parse(text = paste0("list(", paste0(dfnames, collapse = ","), ")")))

for(k in seq_along(dframes)){
  set <- dframes[[k]]
  if(! "var3" %in% colnames(set)) {
    set$var3 <- NA
  }
  # assign the df back to the original name
  eval(parse(text =  paste0(dfnames[k], " = set")))
}


> df1
  var1 var2 var3
1    1    1   NA
2    2    2   NA
3    3    3   NA
> df2
  var1 var2 var3
1    1    1   NA
2    2    2   NA
3    3    3   NA
> df3
  var1 var2 var3
1    1    1    1
2    2    2    2
3    3    3    3

Upvotes: 1

Related Questions