Federico C
Federico C

Reputation: 132

Access scores for observation on linear discriminants in LDA using MASS:lda()

library(MASS)
example(lda)
plot(z)

How can I access all the points in z? I want to know the values of every point along LD1 and LD2 depending on their Sp (c,s,v).

Upvotes: 3

Views: 2902

Answers (3)

scrameri
scrameri

Reputation: 707

Some more details to the nice example by GavinSimpson

## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                   Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1)
train <- sample(1:150, 75)
z <- lda(Iris[train,-5], grouping = Iris[train,5], prior = c(1,1,1)/3)

## get the whole prediction object
pred <- predict(z)

If you want to understand the magic of predict.lda, here is how you can calculate (predict) the matrix of LD scores by hand: All you have to do is to center the original data (or new data of the same kind) using the centering vector (LDA uses the mean of group means) of the original data colMeans(z$means) and multiply each centered observation by the LD loadings (linear coefficients) stored in z$scaling.

Predict by hand

pred.self <- scale(Iris[train,-5], center = colMeans(z$means), scale = FALSE) %*% z$scaling

Verify using predict.lda()

> all.equal(pred$x, pred.self) # it's the same!
[1] TRUE

Verify using plot.lda()

plot(z, dimen = 2)
points(pred.self, col = "blue") # corresponds to plot.lda() output

image output

A closer look at the predict.lda function

getAnywhere(predict.lda)

reveals that since neither the input data nor the LD scores are stored in the lda object, the predict.lda function has to look for the input data in the R environment.

newdata <- eval.parent(z$call$x)
all.equal(newdata, Iris[train,-5]) # it's the same!

Upvotes: 0

Gavin Simpson
Gavin Simpson

Reputation: 174788

What you are looking for is computed as part of the predict() method of objects of class "lda" (see ?predict.lda). It is returned as component x of the object produced by predict(z):

## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                   Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1) ## remove this line if you want it to be pseudo random
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
##  c  s  v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)

## get the whole prediction object
pred <- predict(z)
## show first few sample scores on LDs
head(z$x)

the last line shows the first few rows of the object scores on the linear discriminants

> head(pred$x)
          LD1        LD2
40  -8.334664  0.1348578
56   2.462821 -1.5758927
85   2.998319 -0.6648073
134  4.030165 -1.4724530
30  -7.511226 -0.6519301
131  6.779570 -0.8675742

These scores can be plotted like so

plot(LD2 ~ LD1, data = pred$x)

producing the following plot (for this training sample!)

lda scores plot

Upvotes: 5

csgillespie
csgillespie

Reputation: 60452

When you calling the function plot(z), you are actually calling the function plot.lda - this is an S3 method. Basically, the object z has class lda:

class(z)

We can look at the actual function that is being used:

getS3method("plot", "lda")

This turns out to be rather involved. But the key points are:

x = z
Terms <- x$terms
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
X <- X[, -xint, drop = FALSE]
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling

We can no plot as before:

plot(X[,1], X[,2])

Proviso There might well be an easier way of getting what you want - I just don't know the lda function.

Upvotes: 1

Related Questions