Reputation: 132
library(MASS)
example(lda)
plot(z)
How can I access all the points in z? I want to know the values of every point along LD1 and LD2 depending on their Sp (c,s,v).
Upvotes: 3
Views: 2902
Reputation: 707
Some more details to the nice example by GavinSimpson
## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1)
train <- sample(1:150, 75)
z <- lda(Iris[train,-5], grouping = Iris[train,5], prior = c(1,1,1)/3)
## get the whole prediction object
pred <- predict(z)
If you want to understand the magic of predict.lda
, here is how you can calculate (predict) the matrix of LD scores by hand: All you have to do is to center the original data (or new data of the same kind) using the centering vector (LDA uses the mean of group means) of the original data colMeans(z$means)
and multiply each centered observation by the LD loadings (linear coefficients) stored in z$scaling
.
Predict by hand
pred.self <- scale(Iris[train,-5], center = colMeans(z$means), scale = FALSE) %*% z$scaling
Verify using predict.lda()
> all.equal(pred$x, pred.self) # it's the same!
[1] TRUE
Verify using plot.lda()
plot(z, dimen = 2)
points(pred.self, col = "blue") # corresponds to plot.lda() output
A closer look at the predict.lda
function
getAnywhere(predict.lda)
reveals that since neither the input data nor the LD scores are stored in the lda
object, the predict.lda
function has to look for the input data in the R environment.
newdata <- eval.parent(z$call$x)
all.equal(newdata, Iris[train,-5]) # it's the same!
Upvotes: 0
Reputation: 174788
What you are looking for is computed as part of the predict()
method of objects of class "lda"
(see ?predict.lda
). It is returned as component x
of the object produced by predict(z)
:
## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1) ## remove this line if you want it to be pseudo random
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
## c s v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
## get the whole prediction object
pred <- predict(z)
## show first few sample scores on LDs
head(z$x)
the last line shows the first few rows of the object scores on the linear discriminants
> head(pred$x)
LD1 LD2
40 -8.334664 0.1348578
56 2.462821 -1.5758927
85 2.998319 -0.6648073
134 4.030165 -1.4724530
30 -7.511226 -0.6519301
131 6.779570 -0.8675742
These scores can be plotted like so
plot(LD2 ~ LD1, data = pred$x)
producing the following plot (for this training sample!)
Upvotes: 5
Reputation: 60452
When you calling the function plot(z)
, you are actually calling the function plot.lda
- this is an S3 method. Basically, the object z
has class lda
:
class(z)
We can look at the actual function that is being used:
getS3method("plot", "lda")
This turns out to be rather involved. But the key points are:
x = z
Terms <- x$terms
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
X <- X[, -xint, drop = FALSE]
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
We can no plot as before:
plot(X[,1], X[,2])
Proviso There might well be an easier way of getting what you want - I just don't know the lda
function.
Upvotes: 1