Steve Rowe
Steve Rowe

Reputation: 19413

How do I understand the warnings from rbind?

If I have two data.frames with the same column names, I can use rbind to make a single data frame. However, if I have one is a factor and the other is an int, I get a warning like this:

Warning message: In [<-.factor(*tmp*, ri, value = c(1L, 1L, 0L, 0L, 0L, 1L, 1L, : invalid factor level, NA generated

The following is a simplification of the problem:

t1 <- structure(list(test = structure(c(1L, 1L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L), .Label = c("False", "True"), class = "factor")), .Names = "test", row.names = c(NA, 
-10L), class = "data.frame")
t2 <- structure(list(test = c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L
)), .Names = "test", row.names = c(NA, -10L), class = "data.frame")
rbind(t1, t2)

With the single column, this is easy to understand, but when it is part of a dozen or more factors, it can be difficult. What is there about the warning message to tell me which column to look at? Barring that, what is a good technique to understand which column is in error?

Upvotes: 3

Views: 794

Answers (2)

Steve Rowe
Steve Rowe

Reputation: 19413

Based on thelatemail's answer, here is a function to compare two data.frames for rbinding:

mergeCompare <- function(one, two) {
  cat("Distinct items: ", setdiff(names(one),names(two)), setdiff(names(two),names(one)), "\n")
  print("Non-matching items:")
  common <- intersect(names(one),names(two))
  print (mapply(function(x,y) {class(x)!=class(y)}, one[common], two[common]))
}

Upvotes: 1

thelatemail
thelatemail

Reputation: 93908

You could knock up a simple little comparison script using class and mapply, to compare where the rbind will break down due to non-matching data types, e.g.:

one <- data.frame(a=1,b=factor(1))
two <- data.frame(b=2,a=2)

common <- intersect(names(one),names(two))
mapply(function(x,y) class(x)==class(y), one[common], two[common])

#    a     b 
# TRUE FALSE 

Upvotes: 6

Related Questions