TheChymera
TheChymera

Reputation: 17924

calculate linear regression slope matrix (analogous to correlation matrix) - Python/Pandas

Pandas has a really nice function that gives you a correlation matrix Data Frame for your data DataFrame, pd.DataFrame.corr().

The r of a correlation, however, isn't always that informative. Depending on your application the slope of the linear regression might be just as important. Is there any function that can return that for an input matrix or dataframe?

Other than iterating with scipy.stats.linregress(), which would be a pain, I don't see any way to do this?

Upvotes: 0

Views: 2423

Answers (1)

user2285236
user2285236

Reputation:

Slope of a regression line y=b0 + b1 * x can also be calculated using the correlation coefficient: b1 = corr(x, y) * σx / σy

Using numpy's newaxis to create the σx / σy matrix:

df.corr() * (df.std().values / df.std().values[:, np.newaxis])
Out[59]: 
          A         B         C
A  1.000000 -0.686981  0.252078
B -0.473282  1.000000 -0.263359
C  0.137670 -0.208775  1.000000

where df is:

df
Out[60]: 
   A  B  C
0  5  6  9
1  4  4  2
2  7  3  5
3  4  3  9
4  6  5  3
5  3  8  6
6  2  8  1
7  7  2  7
8  4  1  5
9  1  6  6

And this is for verification:

res = []
for col1, col2 in itertools.product(df.columns, repeat=2):
    res.append(linregress(df[col1], df[col2]).slope)
np.array(res).reshape(3, 3)
Out[72]: 
array([[ 1.        , -0.68698061,  0.25207756],
       [-0.47328244,  1.        , -0.26335878],
       [ 0.1376702 , -0.20877458,  1.        ]])

Upvotes: 2

Related Questions