daisybeats
daisybeats

Reputation: 237

Mahalanobis distance with multiple observations, variables and groups

For the iris data set, I am trying to find the Mahalanobis distances between each pair of species. I have tried the following but have had no luck. I tried the following:

group <- matrix(iris$Species) 
group <- t(group[,-5])

variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
varibles <- as.matrix(iris[,variables])

mahala_sq <- pairwise.mahalanobis(x=variables, grouping=group)

But get the error message

Error in pairwise.mahalanobis(x = variables, grouping = group) : nrow(x) and length(grouping) are different

Upvotes: 0

Views: 813

Answers (2)

HFlavio12
HFlavio12

Reputation: 1

To calculate Mahalanobis distance for each pair of species, you can use the cmahalanobis package:

load(cmahalanobis)

data(iris)

groups <- split(iris, iris$Species)

cmahalanobis(groups)

I'm the author of this package.

Upvotes: 0

Ben Bolker
Ben Bolker

Reputation: 226247

This works:

HDMD::pairwise.mahalanobis(x=iris[,1:4], grouping=iris$Species)
  • x should be a numeric matrix of observations (columns=variables, rows=observations)
  • grouping should be a "vector of characters or values designating group classification for observations" with length equal to nrow(x)

I realized in editing your question that the problem stems from a typo (you assigned varibles instead of variables); if you fix that typo, your code seems to work (at least doesn't throw an error). (I still claim that my solution is simpler ...)

if you wanted to be a little more careful you could use x <- iris[colnames(x) != "Species"] (or a subset(select=) or dplyr::select() analog) to refer to the omitted column by name rather than position.

If you want (for some reason) to run this analysis with a single response variable, you need to use drop=FALSE to prevent a one-column matrix from being collapsed to a vector, i.e. use x=iris[,1,drop=FALSE]

Upvotes: 3

Related Questions