bosbmgatl
bosbmgatl

Reputation: 958

R: Find missing columns, add to data frame if missing

I'd like to write some code that would take a given data frame, check to see if any columns are missing, and if so, add the missing columns filled with 0 or NA. Here's what I've got:

> df
   x1 x2 x4
1   0  1  3
2   3  1  3
3   1  2  1

> nameslist <- c("x1","x2","x3","x4")
> miss.names <- !nameslist %in% colnames(df)
> holder <- rbind(nameslist,miss.names)
> miss.cols <- subset(holder[1,], holder[2,] == "TRUE")

Beyond this point, I can't figure out how to add in the missing column ("x3") without hardcoding it. Ideally, I'd want the new, complete data frame to have columns in the same order as nameslist as well.

Any ideas? My current code can be ignored, no problem.

Upvotes: 10

Views: 13407

Answers (3)

Rodrigo Araujo
Rodrigo Araujo

Reputation: 41

Thanks guys, thanks to you I managed to do that with a list of dataframes (Files) and another list of colnames(ncolunas).

 for (i in serieI)    {
     if ((identical(colnames(Files[[i]]),ncolunas)) == FALSE) {

         nms   = ncolunas
          df =   Files[[i]]
          aux = colnames(df)
          aux1 = row.names(df)

          Missing = setdiff(nms, colnames(df))  

          serie = seq(1,length(Missing)) #creating indices 1-5 for loop
          for (j in serie)  {    #loop to add colums with zeros
              df = cbind(df,c(0))
          }
          colnames(df) = c(aux,Missing)   #updates columns names

          df = df[,order(colnames(df))]  #put colums into order
          df = t(as.matrix(df))          #hanges into matrix
           row.names(df) = aux1          #update lines' names
           Files[[i]] = df              #updates object from list
     } 

 }

Upvotes: 0

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Here's a straightforward approach

df <- data.frame(a=1:4, e=4:1)
nms <- c("a", "b", "d", "e")   # Vector of columns you want in this data.frame

Missing <- setdiff(nms, names(df))  # Find names of missing columns
df[Missing] <- 0                    # Add them, filled with '0's
df <- df[nms]                       # Put columns in desired order
#   a b d e
# 1 1 0 0 4
# 2 2 0 0 3
# 3 3 0 0 2
# 4 4 0 0 1

Upvotes: 29

Maiasaura
Maiasaura

Reputation: 32986

library(stringr)
df <- data.frame(X1=1:4,X2=1:4,X5=1:4)
>df
  X1 X2 X5
1  1  1  1
2  2  2  2
3  3  3  3
4  4  4  4
current <- as.numeric(str_extract(names(df),"[0-9]"))
missing <-seq(min(current),max(current))

df[paste("X",missing[!missing %in% current],sep="")]<-0

>df[,order(colnames(df))]
  X1 X2 X3 X4 X5
1  1  1  0  0  1
2  2  2  0  0  2
3  3  3  0  0  3
4  4  4  0  0  4

Upvotes: 1

Related Questions