setbackademic
setbackademic

Reputation: 143

How do you determine how many variables is too many for a CCA?

I am running a CCA of some ecological data with ~50 sites and several hundred species. I know that you have to be careful when your number of explanatory variables approaches your number of samples. I have 23 explanatory variables, so this isn't a problem for me, but I have also heard that using too many explanatory variables can start to "un-constrain" the CCA.

Are there any guidelines about how many explanatory variables is appropriate? So far, I have just plotted them all and then removed the ones that appear to be redundant (leaving me with 8). Can I use the intertia values to help inform/justify this?

Thanks

Upvotes: 0

Views: 768

Answers (1)

Jari Oksanen
Jari Oksanen

Reputation: 3722

This is the same question as asking "how many variables are too many for regression analysis?". Not "almost the same", but exactly the same: CCA is an ordination of fitted values of linear regression. In most severe cases you can over-fit. In CCA this is evident when the first eigenvalues of CCA and (unconstrained) CA are almost identical and the ordinations look similar in first dimensions (you can use Procrustes analysis to check this). Extreme case would be that residual variation disappears, but in ordination you focus on first dimensions, and there the constraints can get lost much earlier than in later constrained axes or in residuals. More importantly: you must see CCA as a kind of regression analysis and have the same attitude to constraints as to explanatory (independent) variables in regression. If you have no prior hypothesis to study, you have all the problems of model selection of regression analysis plus the problems of multivariate ordination, but these are non-technical problems that should be handled somewhere else than in stackoverflow.

Upvotes: 2

Related Questions