EML
EML

Reputation: 671

Overwrite elements of list in R after adding missing column

For a list of data frames, I would like to check if a column is present and if it's not, add that column with NA's to all data frames. Most importantly, I am trying to overwrite the old data frames.

Datasets:

df1 <- data.frame(a=c(1,2), b=c(3,NA))
df2 <- data.frame(b=c(1,2), c=c(3,NA))
df_list=list(df1, df2)
name <- "a"

My attempt:

df_list <- lapply(df_list, function(x) x[name[!(name %in% colnames(x))]] = NA) 

I am looking for this result:

> df_list
[[1]]
  a  b
1 1  3
2 2 NA

[[2]]
  b  c  a
1 1  3 NA
2 2 NA NA

Upvotes: 1

Views: 58

Answers (3)

GKi
GKi

Reputation: 39717

Modifying you code - what was missing was to return the updated x or using setdiff.

#lapply(df_list, function(x) x[name[!(name %in% colnames(x))]] = NA) #Your original code
lapply(df_list, function(x) {x[name[!(name %in% colnames(x))]] = NA; x}) #Modified
lapply(df_list, function(x) {x[,setdiff(name, names(x))] <- NA; x}) #Alternative
#[[1]]
#  a  b
#1 1  3
#2 2 NA
#
#[[2]]
#  b  c  a
#1 1  3 NA
#2 2 NA NA

Upvotes: 1

Duck
Duck

Reputation: 39613

I would suggest a similar approach like @GregorThomas but using vectors to save the results of those dataframes which do not contain the variable and then with lapply() you can create the desired variable:

#Data
df1 <- data.frame(a=c(1,2), b=c(3,NA))
df2 <- data.frame(b=c(1,2), c=c(3,NA))
df_list=list(df1, df2)
name <- "a"
#Check
x <- sapply(df_list,function(x) length(which(names(x)==name)))
y <- which(x==0)
#Format new list
df_list[y] <- lapply(df_list[y],function(x) {x[[name]]<-NA;return(x)})

Output:

df_list

[[1]]
  a  b
1 1  3
2 2 NA

[[2]]
  b  c  a
1 1  3 NA
2 2 NA NA

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 146040

I would use a for loop to modify the data frames in place:

for(i in seq_along(df_list)) {
  if(!name %in% names(df_list[[i]])) {
    df_list[[i]][[name]] = NA
  }
}

You could take a similar approach with lapply, but in this case I find the for loop easier to understand. We need to make sure the lapplied function returns the data frame--either modified or as-is (this is the main difference from your attempt).

df_list = lapply(df_list, function(x) {
  if(! name %in% names(x)) {
    x[[name]] = NA
  }
  return(x)
})

Upvotes: 1

Related Questions