Reputation: 958
I'd like to write some code that would take a given data frame, check to see if any columns are missing, and if so, add the missing columns filled with 0 or NA. Here's what I've got:
> df
x1 x2 x4
1 0 1 3
2 3 1 3
3 1 2 1
> nameslist <- c("x1","x2","x3","x4")
> miss.names <- !nameslist %in% colnames(df)
> holder <- rbind(nameslist,miss.names)
> miss.cols <- subset(holder[1,], holder[2,] == "TRUE")
Beyond this point, I can't figure out how to add in the missing column ("x3") without hardcoding it. Ideally, I'd want the new, complete data frame to have columns in the same order as nameslist as well.
Any ideas? My current code can be ignored, no problem.
Upvotes: 10
Views: 13407
Reputation: 41
Thanks guys, thanks to you I managed to do that with a list of dataframes (Files) and another list of colnames(ncolunas).
for (i in serieI) {
if ((identical(colnames(Files[[i]]),ncolunas)) == FALSE) {
nms = ncolunas
df = Files[[i]]
aux = colnames(df)
aux1 = row.names(df)
Missing = setdiff(nms, colnames(df))
serie = seq(1,length(Missing)) #creating indices 1-5 for loop
for (j in serie) { #loop to add colums with zeros
df = cbind(df,c(0))
}
colnames(df) = c(aux,Missing) #updates columns names
df = df[,order(colnames(df))] #put colums into order
df = t(as.matrix(df)) #hanges into matrix
row.names(df) = aux1 #update lines' names
Files[[i]] = df #updates object from list
}
}
Upvotes: 0
Reputation: 162321
Here's a straightforward approach
df <- data.frame(a=1:4, e=4:1)
nms <- c("a", "b", "d", "e") # Vector of columns you want in this data.frame
Missing <- setdiff(nms, names(df)) # Find names of missing columns
df[Missing] <- 0 # Add them, filled with '0's
df <- df[nms] # Put columns in desired order
# a b d e
# 1 1 0 0 4
# 2 2 0 0 3
# 3 3 0 0 2
# 4 4 0 0 1
Upvotes: 29
Reputation: 32986
library(stringr)
df <- data.frame(X1=1:4,X2=1:4,X5=1:4)
>df
X1 X2 X5
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
current <- as.numeric(str_extract(names(df),"[0-9]"))
missing <-seq(min(current),max(current))
df[paste("X",missing[!missing %in% current],sep="")]<-0
>df[,order(colnames(df))]
X1 X2 X3 X4 X5
1 1 1 0 0 1
2 2 2 0 0 2
3 3 3 0 0 3
4 4 4 0 0 4
Upvotes: 1