SomePhysicsStudent
SomePhysicsStudent

Reputation: 129

PCA (Principle Component Analysis) on multiple datasets

I have a set of climate data (temperature, pressure and moisture for example), X, Y, Z which are matricies with dimensions (n x p) where n is the number of observations and p is the number of spatial points.

Previously, to investigate modes of variability in dataset X, I simply performed a empirical orthogonal function (EOF) analysis OR Principle component Analysis (PCA) on X. This involved decomposing (via SVD), the matrix X.

To investigate the coupling of the modes of variability of X and Y, I used maximum covariance analysis (MCA) which involved decomposing a covariance matrix proportional to XY^{T}. (T is transpose)

However, if I wish to looked at all three datasets, how do I go about doing this? One idea I had was to form a fourth matrix, L, which will be the 'feature' concatenation of the three datasets:

L = [X, Y, Z]

so that my matrix L will have dimensions (n x 3p).

I would then use standard PCA/EOF analysis and use SVD to decompose this matrix L and then I would obtain modes of variabiilty with size (3p x 1) and thus subsequently the mode associated with X is the first p values, the mode associated with Y is the second set of p values and the mode associated with Z is the last p values.

Is this correct? Or can anyone suggest a better way of looking at the coupling of all three (or more) datasets?

Thank you so much!

Upvotes: 2

Views: 1015

Answers (1)

Lukasz Tracewski
Lukasz Tracewski

Reputation: 11377

I'd recommend to treat spatial points as extra dimension, i.e. f x n x p, where 'f' is your number of features. At this point you should use multilinear extension of PCA that can work on tensor data.

Upvotes: 1

Related Questions