Joshua Rosenberg
Joshua Rosenberg

Reputation: 4226

Error in summary.manova - residuals have rank order deficiency

I am trying to carry out a MANOVA. There are 7 dependent variables and a categorical independent variable representing 6 groups.

The data are available here: http://pastebin.com/fqXNjWtr

Click download above the text. I am reading the file with R like this (I think the name of the downloaded file should be the same for you; I'm using a Macintosh operating system):

> df <- read.csv("~/downloads/fqXNjWtr.txt", stringsAsFactors = F)
> str(df)

'data.frame':   244 obs. of  8 variables:
 $ var1              : num  0.3 0 0.312 0 0.643 ...
 $ var2              : num  0 0.125 0 0.375 0.0714 ...
 $ var3              : num  0 0.0625 0.0625 0 0.0714 ...
 $ var4              : num  0.2 0.3125 0.0625 0.0625 0 ...
 $ var5              : num  0.1 0.25 0.438 0.188 0 ...
 $ var6              : num  0.2 0.0625 0.125 0.0625 0.0714 ...
 $ var7              : num  0.2 0.188 0 0.312 0.143 ...
 $ cluster_assignment: int  1 4 2 6 1 4 3 3 4 6 ...

I am then creating the dependent variable, DV:

> df$DV <- as.matrix(df[, 1:7])

I am then carrying out the MANOVA:

> mv_out <- manova(DV ~ cluster_assignment, data = df)
Call:
   manova(DV ~ cluster_assignment, data = df)

Terms:
                cluster_assignment Residuals
resp 1                    5.160838  6.738524
resp 2                    3.384101  3.622020
resp 3                    0.000200  3.365565
resp 4                    0.065469  2.743549
resp 5                    0.889180  8.019733
resp 6                    0.442187  5.884827
resp 7                    3.133188  7.736993
Deg. of Freedom                  1       242

Residual standard errors: 0.1668686 0.1223398 0.1179292 0.1064752 0.1820423 0.1559406 0.1788045
Estimated effects may be unbalanced

When I then try the summary() function, I get this error:

> summary(mv_out)
Error in summary.manova(mv_out) : residuals have rank 6 < 7

Based on some other posts, this seems to suggest that there are not enough observations given the number of variables, or that some of the predictors may be multicollinear. But this doesn't seem to be the case with this data:

> cor(df[, 1:7)

            var1         var2        var3         var4        var5        var6       var7
var1  1.00000000 -0.417605243 -0.05274197 -0.118358341 -0.25617705  0.06089533 -0.4360312
var2 -0.41760524  1.000000000 -0.07181878  0.008873035 -0.29523300 -0.33954011  0.1958746
var3 -0.05274197 -0.071818782  1.00000000  0.131137673 -0.11624079 -0.14408909 -0.2951076
var4 -0.11835834  0.008873035  0.13113767  1.000000000 -0.14361455 -0.24308229 -0.1491373
var5 -0.25617705 -0.295233000 -0.11624079 -0.143614554  1.00000000 -0.03180183 -0.2383027
var6  0.06089533 -0.339540114 -0.14408909 -0.243082287 -0.03180183  1.00000000 -0.3215075
var7 -0.43603124  0.195874568 -0.29510761 -0.149137349 -0.23830275 -0.32150753  1.0000000

I'm puzzled about what may be going on.

Upvotes: 2

Views: 11697

Answers (2)

Xben
Xben

Reputation: 31

DV´s are not full rank, as rowSums(df$DV) shows that row values add up to a constant value. As danielson pointed out, this violates MANOVA assumptions. This kind of data, which seems to follow a pattern of "parts-of-a-whole" structure is sometimes referred to as compositional data. You can get nice tools and learn more about them in the following website: http://www.compositionaldata.com/

However, for a brief solution, I recommend you to apply an isometric log-ratio transformation (for instance the ilr function in the compositions package in R) to the DV before building the MANOVA model. This should prevent the error message and the MANOVA assumption issues.

library(compositions)
mv_out <- manova(ilr(clo(DV)) ~ cluster_assignment, data = df)
summary(mv_out)

This should give you a fair solution.

Upvotes: 3

danielson
danielson

Reputation: 1029

You can resolve this error by setting the 'tol' parameter in ?summary.manova. df$DV fails the rank deficient test with the default tol=1e-7 because the rowSums are 1. This might not produce the results you intended though.

summary(mv_out,tol=0)
                       Df Pillai approx F num Df den Df Pr(>F)
df$cluster_assignment   1 1.2106  -193.79      7    236       
Residuals             242     

Upvotes: 5

Related Questions