Reputation: 1118
I was reading the book 'Data Mining with R' and came across this code:
library(DMwR)
clean.algae <- knnImputation(algae, k = 10)
x <- sapply(names(clean.algae)[12:18],
function(x,names.attrs) {
f <- as.formula(paste(x,"~ ."))
dataset(f,clean.algae[,c(names.attrs,x)],x)
},
names(clean.algae)[1:11])
I thought x
could be rewritten as:
y <- sapply(names(clean.algae)[12:18],
function(x) {
f <- as.formula(paste(x,"~ ."))
dataset(f,clean.algae[,c(names(clean.algae)[1:11],x)],x)
}
)
However, identical(x,y)
returns FALSE
.
I decided to investigate why by restricting my attention to just the first element these lists.
I found that:
identical(attributes(x[[1]])$data,
attributes(y[[1]])$data)
[1] FALSE
However:
which(!(attributes(x[[1]])$data == attributes(y[[1]])$data))
integer(0)
Which to me means all elements in the data frame are equal, hence the two data frames must be identical. Why is this not the case?
I also have similar question for the object's formula attribute:
> identical(attributes(x[[1]])$formula,
+ attributes(y[[1]])$formula)
[1] FALSE
>
> attributes(x[[1]])$formula == attributes(y[[1]])$formula
[1] TRUE
Upvotes: 1
Views: 432
Reputation: 226077
tl;dr the source of the non-identicality is indeed in differences in associated environments, both of the @formula
slots of the components of the objects, and in the terms
attributes of the @data
slots. As @ThomasK points out in comments above, for most comparison purposes all.equal()
is good enough/preferred ...
Formulas are equal but not identical:
identical(x$a1@formula,y$a1@formula)
## [1] FALSE
all.equal(x$a1@formula,y$a1@formula)
## TRUE
Environments differ:
environment(x$a1@formula)
## <environment: 0x9a408dc>
environment(y$a1@formula)
## <environment: 0x9564aa4>
Setting the environments to be identical makes the formulae identical:
environment(x$a1@formula) <- .GlobalEnv
environment(y$a1@formula) <- .GlobalEnv
identical(x$a1@formula,y$a1@formula)
## TRUE
However, there's more stuff that's different: identical(x$a1,y$a1)
is still FALSE.
Digging some more:
for (i in slotNames(x$a1)) {
print(i)
print(identical(slot(x$a1,i),slot(y$a1,i)))
}
## [1] "data"
## [1] FALSE
## [1] "name"
## [1] TRUE
## [1] "formula"
## [1] TRUE
Digging deeper into the data
slot (also with judicious use of str()
) finds more environments -- associated with terms (closely related to formulae) this time:
dx <- x$a1@data
dy <- y$a1@data
environment(attr(dx,"terms"))
## <environment: 0x9a408dc>
environment(attr(dy,"terms"))
## <environment: 0x9564aa4>
Setting these equal to each other should lead to identicality between x$a1
and y$a1
, but I haven't tested.
Upvotes: 5