Reputation: 654
Suppose I have 2 data frames:
a1 <- data.frame(a = 1:5, b=2:6)
a2 <- data.frame(a = 1:5, b=c(2:5,NA))
I would like to identify which columns are not identical (I will need the column number later). I thought that this would do the trick:
apply(!a1==a2, 2, sum, na.rm=TRUE)
However, because the last entry in a2 is an NA, it doesn't work.
Upvotes: 0
Views: 55
Reputation: 886938
If you wanted to use sum
, you could try
colSums(a1==a2, na.rm=TRUE)!=nrow(a1)
# a b
#FALSE TRUE
Or using your code
apply(a1==a2, 2, sum, na.rm=TRUE)!=nrow(a1)
# a b
#FALSE TRUE
Upvotes: 0
Reputation: 99321
Not sure why you're using sum
, but to identify which columns are not identical you could use mapply
with identical
and negate the result.
which(!mapply(identical, a1, a2))
# b
# 2
for the column number. Or more simply for use in a column subset
!mapply(identical, a1, a2)
# a b
# FALSE TRUE
Just as a note, the word identical has a meaning in R that may be different from the result of ==
, so it's possible you may need to clarify your question a bit.
x <- 1
y <- 1L
x == y
# [1] TRUE
identical(x, y)
# [1] FALSE
Upvotes: 2