Reputation: 143
I am running a CCA of some ecological data with ~50 sites and several hundred species. I know that you have to be careful when your number of explanatory variables approaches your number of samples. I have 23 explanatory variables, so this isn't a problem for me, but I have also heard that using too many explanatory variables can start to "un-constrain" the CCA.
Are there any guidelines about how many explanatory variables is appropriate? So far, I have just plotted them all and then removed the ones that appear to be redundant (leaving me with 8). Can I use the intertia values to help inform/justify this?
Thanks
Upvotes: 0
Views: 768
Reputation: 3722
This is the same question as asking "how many variables are too many for regression analysis?". Not "almost the same", but exactly the same: CCA is an ordination of fitted values of linear regression. In most severe cases you can over-fit. In CCA this is evident when the first eigenvalues of CCA and (unconstrained) CA are almost identical and the ordinations look similar in first dimensions (you can use Procrustes analysis to check this). Extreme case would be that residual variation disappears, but in ordination you focus on first dimensions, and there the constraints can get lost much earlier than in later constrained axes or in residuals. More importantly: you must see CCA as a kind of regression analysis and have the same attitude to constraints as to explanatory (independent) variables in regression. If you have no prior hypothesis to study, you have all the problems of model selection of regression analysis plus the problems of multivariate ordination, but these are non-technical problems that should be handled somewhere else than in stackoverflow.
Upvotes: 2