ECub Devs
ECub Devs

Reputation: 175

How to compute MxN correlation matrix

I have a CSV file contains some tweets and two sets of features (A and B), as follows:

TWEET, A1, A2, B1, B2, B3
tweet text, 0.23, 0.54, 120, 60, 39
tweet text, 0.33, 0.7, 70, 20, 36
tweet text, 0.8, 0.41, 68, 52, 29

As you see they have different lengths (2 columns for A features, and 3 columns for B features). I want to compare their relationship (or dependence) to each other. My goal is to identify dependent features to remove some and reduce feature dimensions. One possible solution is to use a correlation matrix, which is accessible with DataFrame.corr . But this matrix only accepts arrays of the same size. The question is how can I compute the correlation matrix of different length features, like A and B in the above example? After correlation, I will be able to say, for example, that features A1 and B2 are sufficient enough and we can remove other features; Because they are completely dependent on A1 and B2.

Any other suggestions are welcome.

Upvotes: 1

Views: 320

Answers (1)

Equinox
Equinox

Reputation: 6758

Correlation doesn't need to be MxN. all you are doing is checking correlation between N columns so it will be NxN matrix. From the N*N you can consider the ones which you like and neglect the others.

import seaborn as sns
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO('''TWEET, A1, A2, B1, B2, B3
tweet text, 0.23, 0.54, 120, 60, 39
tweet text, 0.33, 0.7, 70, 20, 36
tweet text, 0.8, 0.41, 68, 52, 29
'''),sep=',')
print(df.corr()) # Pandas correlation matrix
sns.heatmap(df.corr(),annot = True)

Output:

    A1              A2          B1         B2          B3
A1  1.000000    -0.732859   -0.661319   0.167649    -0.991352
A2  -0.732859   1.000000    -0.025703   -0.793614   0.637235
B1  -0.661319   -0.025703   1.000000    0.628619    0.754036
B2  0.167649    -0.793614   0.628619    1.000000    -0.036827
B3  -0.991352   0.637235    0.754036    -0.036827   1.000000

enter image description here

Upvotes: 1

Related Questions