Run univariate regression between each variable python

Question

I have a daraframe with 49 columns, I want to see if there some relation between columns, i.e. run simple linear regression between each columns. Expected ouput should be matrix with columns and rows named same and filled by regression coefficient.

e.g. df:

bar foo too ten
1   2   3   4
4   5   6   5
7   8   9   6

Output:

     bar             foo             too              ten
bar  r_coef(bar,bar) r_coef(bar,foo) r_coef(bar,too)  r_coef(bar,ten)
foo  r_coef(foo,bar) r_coef(foo,foo) r_coef(foo,too)  r_coef(foo,ten)
too  r_coef(too,bar) r_coef(too,foo) r_coef(too,too)  r_coef(too,ten)
ten  r_coef(ten,bar) r_coef(ten,foo) r_coef(ten,too)  r_coef(ten,ten)

Corralien · Accepted Answer

IIUC, you can use np.polyfit. You have a first degree polynomial (y = mx + b) so set degree to 1 and you want to get the intercept value (b).

As @mozway suggests you, use corr but with a custom method:

# [1] is the intercept value, [0] is the slope
r_coef = lambda x, y: np.polyfit(x, y, deg=1)[1]

out = df.corr(method=r_coef)
print(out)

# Output
          bar       foo  too       ten
bar  1.000000  1.000000  2.0  3.666667
foo  1.000000  1.000000  1.0  3.333333
too  2.000000  1.000000  1.0  3.000000
ten  3.666667  3.333333  3.0  1.000000

Run univariate regression between each variable python

Answers (2)

less ambiguous example:

Related Questions