Unsupervised Filter Feature Selection - Rank by Correlation

I have a set of features which and I wish to rank according to their Correlation Coefficient with each other, without accounting for the true label (that would by a Supervised feature selection, right?). My objective is selecting the first feature as the one more correlated with every other, take it out and so on.

The problem is how to test the correlation of a vector with a matrix (all the other vectors/features)? Is it possible to do this or am I doing this all right.

PS: I'm using MATLAB 2013b

Thank you all

Upvotes: 1

Views: 1885

Answers (1)

Amro
Amro

Reputation: 124563

Say you had a n-by-d matrix X where the rows are instances and columns are the features/dimensions, then you can compute the correlation coefficient matrix simply using the corr or corrcoeff functions:

% Fisher Iris dataset, 150x4
>> load fisheriris
>> X = meas;

>> C = corr(X)
C =
    1.0000   -0.1176    0.8718    0.8179
   -0.1176    1.0000   -0.4284   -0.3661
    0.8718   -0.4284    1.0000    0.9629
    0.8179   -0.3661    0.9629    1.0000

The result is a d-by-d matrix containing correlation coefficients of each feature against every other feature. The diagonal is thus all ones (because corr(x,x) = 1), the matrix is also symmetric (because corr(x,y) = corr(y,x)). Values range from -1 to 1, where -1 means inverse correlation between two variables, 1 means positive correlation, and 0 means no linear correlation.

Now because you want to remove the feature which is on average the most correlated with other features, you have to summarize that matrix as one number per feature. One way to do that is to compute the mean:

% mean
>> mean_corr = mean(C)
mean_corr =
    0.6430    0.0220    0.6015    0.6037

% most correlated feature on average
>> [~,idx] = max(mean_corr)
idx =
     1

% drop that feature
>> X(:,idx) = [];

EDIT:

I probably should have taken the mean of the absolute value of C in the above code, because we don't care if two variables are positively or negatively correlated, only how strong the correlation is.

Upvotes: 2

Related Questions