Maximilian
Maximilian

Reputation: 4229

rbind based on columns name and exclude no match

Sample data:

l <- list(x=data.frame(X1=1,X2=2,X3=3,X4=4,X5=5),
          y=data.frame(X1=6,X8=7,X4=8,X9=9,X5=10),
          z=data.frame(X1=11,X2=12,X3=13,X4=14,X5=15)
          )

I would like to rbind this list based on pre-specified column names, so that the column name (and it's column position matches).

# these are pre-defined columns names we want to `rbind` if no match, exclude the list entry
col <- c("X1","X2","X3","X4","X5") 

The desired output should be data.frame:

  X1  X2  X3  X4  X5
   1   2   3   4   5
  11  12  13  14  15

EDIT: maybe like this:

do.call(rbind, lapply(l, function(x) x[!any(is.na(match(c("X1","X2","X3","X4","X5"), names(x))))]))

Upvotes: 2

Views: 981

Answers (4)

akrun
akrun

Reputation: 887048

Another option using data.table

library(data.table)#v1.9.5+
na.omit(rbindlist(l, fill=TRUE)[,col, with=FALSE])
#   X1 X2 X3 X4 X5
#1:  1  2  3  4  5
#2: 11 12 13 14 15

Upvotes: 2

shadowtalker
shadowtalker

Reputation: 13833

Here's one way to do it:

match_all_columns <- function (d, col) {
  if (all(names(d) %in% col)) {
    out <- d[, col]
  } else {
    out <- NULL
  }
  out
}
# or as a one-liner
match_all_columns <- function (d, col) if (all(names(d) %in% col)) d[col]

matched_data <- lapply(l, match_all_columns, col)
result <- do.call(rbind, matched_data)
result
#   X1 X2 X3 X4 X5
# x  1  2  3  4  5
# z 11 12 13 14 15

rbind knows to just ignore the NULL elements.

edit: I swapped d[, col] with d[col] because a) it looks nicer, b) it prevents the data frame being dropped to a vector if col only has one element, and c) I think it's slightly more performant on large data frames.

Upvotes: 3

Dominic Comtois
Dominic Comtois

Reputation: 10401

Yet another possibility, allowing variation in the columns' order:

output.df <- data.frame(X1=numeric(), X2=numeric(), X3=numeric(),
                        X4=numeric(), X5=numeric())

for(i in seq_along(l)) {
    if(identical(sort(colnames(l[[i]])),sort(colnames(output.df))))
        output.df[nrow(output.df)+1,] <- l[[i]][,colnames(output.df)]
}

output.df

#   X1 X2 X3 X4 X5
# 1  1  2  3  4  5
# 2 11 12 13 14 15

Upvotes: 3

Maximilian
Maximilian

Reputation: 4229

This seems to be working too:

do.call(rbind, lapply(l, function(x) x[!any(is.na(match(c("X1","X2","X3","X4","X5"), names(x))))]))

Upvotes: 2

Related Questions