ker_laeda86
ker_laeda86

Reputation: 287

Run univariate regression between each variable python

I have a daraframe with 49 columns, I want to see if there some relation between columns, i.e. run simple linear regression between each columns. Expected ouput should be matrix with columns and rows named same and filled by regression coefficient.

e.g. df:

bar foo too ten
1   2   3   4
4   5   6   5
7   8   9   6

Output:

     bar             foo             too              ten
bar  r_coef(bar,bar) r_coef(bar,foo) r_coef(bar,too)  r_coef(bar,ten)
foo  r_coef(foo,bar) r_coef(foo,foo) r_coef(foo,too)  r_coef(foo,ten)
too  r_coef(too,bar) r_coef(too,foo) r_coef(too,too)  r_coef(too,ten)
ten  r_coef(ten,bar) r_coef(ten,foo) r_coef(ten,too)  r_coef(ten,ten)

Upvotes: 1

Views: 168

Answers (2)

Corralien
Corralien

Reputation: 120419

IIUC, you can use np.polyfit. You have a first degree polynomial (y = mx + b) so set degree to 1 and you want to get the intercept value (b).

As @mozway suggests you, use corr but with a custom method:

# [1] is the intercept value, [0] is the slope
r_coef = lambda x, y: np.polyfit(x, y, deg=1)[1]

out = df.corr(method=r_coef)
print(out)

# Output
          bar       foo  too       ten
bar  1.000000  1.000000  2.0  3.666667
foo  1.000000  1.000000  1.0  3.333333
too  2.000000  1.000000  1.0  3.000000
ten  3.666667  3.333333  3.0  1.000000

Upvotes: 2

mozway
mozway

Reputation: 260735

Looks like you simply want to use corr:

df.corr()

output:

     bar  foo  too  ten
bar  1.0  1.0  1.0  1.0
foo  1.0  1.0  1.0  1.0
too  1.0  1.0  1.0  1.0
ten  1.0  1.0  1.0  1.0

less ambiguous example:

np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(4,4)),
                  columns=['bar', 'foo', 'too', 'ten'])

df.corr()
          bar       foo       too       ten
bar  1.000000 -0.701808  0.595832 -0.211943
foo -0.701808  1.000000 -0.911949 -0.547439
too  0.595832 -0.911949  1.000000  0.551369
ten -0.211943 -0.547439  0.551369  1.000000

Upvotes: 1

Related Questions