Reputation: 1677
Loaded a bunch of Excel files as data frames, standarized column names using a dictionnary list and since it's supposed half of columns are destined to be in other data frame, I've tried to split & stack them all using rbind.fill
from dplyr
package.
The issue is that some of them have 90 columns, some 56, some other just 23, so when selecting columns (I got a list of column names for split1, split2) luckily, you got them all, but some other times you don't.
# this is supposed to be in a loop splitting & stacking all df
split1 <- df[ ,c("column1","column2","column3")]
Error in `[.data.frame`(df, , c("column1","column2","column3")) :
undefined columns selected
is there any way of selecting columns from a data frame no matter if the mentioned columns are unavailable?
I don't care if columns are filled with NA or any other value (0, NULL...). Just need to make selection properly to stack all split1, split2....
Found some SO answers for not the exact same issue and using data.table
package. A base R solution is preferred.
Upvotes: 0
Views: 161
Reputation: 1312
sometimes it's not needed a particular trick or code hint. Just reframing the problem gives you a fairly neat solution.
I'll assume the whole bunch of Excel files got all the needed columns (please let me know if this is not the case).
So, given that all files got all columns....why don't stack all files using dplyr::rbind.fill
and after all files are in one data frame, then split.
Pseudocode:
enormous <- data.frame()
for (i in 1:length(dfs)) {
enormous <- rbind.fill(enormous, i)
}
# then split
split1 <- subset columns from enormous
split2 <- subset columns from enormous
It's less code and less computer intensive. Take into account that subsetting columns and then splitting must be done per file, but stacking all is easier (yes, you can use the data.table
trick here) and must be just one splitting step.
is this a valid approach for your question?
Upvotes: 1
Reputation: 886938
We can use setdiff
to find the elements that are not in the column names and assign it to NA
v1 <- c("column1","column2","column3")
nm1 <- setdiff(v1, names(df))
df[nm1] <- NA
df[v1]
For ease of use, this can be wrapped as a function
fCol <- function(vec, dat){
nm1 <- setdiff(vec, names(dat))
dat[nm1] <- NA
dat[vec]
}
fCol(v1, df)
# column1 column2 column3
#1 1 NA NA
#2 2 NA NA
#3 3 NA NA
#4 4 NA NA
#5 5 NA NA
df <- data.frame(column1 = 1:5, col2 = 6:10)
Upvotes: 1