Reputation: 287
I have a daraframe with 49 columns, I want to see if there some relation between columns, i.e. run simple linear regression between each columns. Expected ouput should be matrix with columns and rows named same and filled by regression coefficient.
e.g. df
:
bar foo too ten
1 2 3 4
4 5 6 5
7 8 9 6
Output:
bar foo too ten
bar r_coef(bar,bar) r_coef(bar,foo) r_coef(bar,too) r_coef(bar,ten)
foo r_coef(foo,bar) r_coef(foo,foo) r_coef(foo,too) r_coef(foo,ten)
too r_coef(too,bar) r_coef(too,foo) r_coef(too,too) r_coef(too,ten)
ten r_coef(ten,bar) r_coef(ten,foo) r_coef(ten,too) r_coef(ten,ten)
Upvotes: 1
Views: 168
Reputation: 120419
IIUC, you can use np.polyfit
. You have a first degree polynomial (y = mx + b
) so set degree to 1 and you want to get the intercept value (b
).
As @mozway suggests you, use corr
but with a custom method:
# [1] is the intercept value, [0] is the slope
r_coef = lambda x, y: np.polyfit(x, y, deg=1)[1]
out = df.corr(method=r_coef)
print(out)
# Output
bar foo too ten
bar 1.000000 1.000000 2.0 3.666667
foo 1.000000 1.000000 1.0 3.333333
too 2.000000 1.000000 1.0 3.000000
ten 3.666667 3.333333 3.0 1.000000
Upvotes: 2
Reputation: 260735
Looks like you simply want to use corr
:
df.corr()
output:
bar foo too ten
bar 1.0 1.0 1.0 1.0
foo 1.0 1.0 1.0 1.0
too 1.0 1.0 1.0 1.0
ten 1.0 1.0 1.0 1.0
np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(4,4)),
columns=['bar', 'foo', 'too', 'ten'])
df.corr()
bar foo too ten
bar 1.000000 -0.701808 0.595832 -0.211943
foo -0.701808 1.000000 -0.911949 -0.547439
too 0.595832 -0.911949 1.000000 0.551369
ten -0.211943 -0.547439 0.551369 1.000000
Upvotes: 1