mauna
mauna

Reputation: 1118

Why aren't these two R objects identical?

I was reading the book 'Data Mining with R' and came across this code:

library(DMwR)

clean.algae <- knnImputation(algae, k = 10)
x <- sapply(names(clean.algae)[12:18],
            function(x,names.attrs) {
              f <- as.formula(paste(x,"~ ."))
              dataset(f,clean.algae[,c(names.attrs,x)],x)
            },
            names(clean.algae)[1:11])

I thought x could be rewritten as:

y <- sapply(names(clean.algae)[12:18],
            function(x) {
              f <- as.formula(paste(x,"~ ."))
              dataset(f,clean.algae[,c(names(clean.algae)[1:11],x)],x)
            }
)

However, identical(x,y) returns FALSE.

I decided to investigate why by restricting my attention to just the first element these lists.

I found that:

identical(attributes(x[[1]])$data,
          attributes(y[[1]])$data)
[1] FALSE

However:

which(!(attributes(x[[1]])$data == attributes(y[[1]])$data))
integer(0)

Which to me means all elements in the data frame are equal, hence the two data frames must be identical. Why is this not the case?

I also have similar question for the object's formula attribute:

> identical(attributes(x[[1]])$formula,
+           attributes(y[[1]])$formula)
[1] FALSE
> 
> attributes(x[[1]])$formula == attributes(y[[1]])$formula
[1] TRUE

Upvotes: 1

Views: 432

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226077

tl;dr the source of the non-identicality is indeed in differences in associated environments, both of the @formula slots of the components of the objects, and in the terms attributes of the @data slots. As @ThomasK points out in comments above, for most comparison purposes all.equal() is good enough/preferred ...

Formulas are equal but not identical:

identical(x$a1@formula,y$a1@formula)
## [1] FALSE
all.equal(x$a1@formula,y$a1@formula)
## TRUE

Environments differ:

environment(x$a1@formula)
## <environment: 0x9a408dc>
environment(y$a1@formula)
## <environment: 0x9564aa4>

Setting the environments to be identical makes the formulae identical:

environment(x$a1@formula) <- .GlobalEnv
environment(y$a1@formula) <- .GlobalEnv
identical(x$a1@formula,y$a1@formula)
## TRUE

However, there's more stuff that's different: identical(x$a1,y$a1) is still FALSE.

Digging some more:

for (i in slotNames(x$a1))  {
    print(i)
    print(identical(slot(x$a1,i),slot(y$a1,i)))
}
## [1] "data"
## [1] FALSE
## [1] "name"
## [1] TRUE
## [1] "formula"
## [1] TRUE

Digging deeper into the data slot (also with judicious use of str()) finds more environments -- associated with terms (closely related to formulae) this time:

dx <- x$a1@data
dy <- y$a1@data
environment(attr(dx,"terms"))
## <environment: 0x9a408dc>
environment(attr(dy,"terms"))
## <environment: 0x9564aa4>

Setting these equal to each other should lead to identicality between x$a1 and y$a1, but I haven't tested.

Upvotes: 5

Related Questions