Reputation: 335
I'm setting a pandas dataframe df1
and creating new column based on formulas from an excel sheet. I need to use the pd.corr
function to find correlation between multiple columns and using the results to create a new column on the same dataframe df1
.
sample df1
data:
trc_2017 trc_2016 trc_2015 tsr_2017 tsr_2016 tsr_2015 roe
0.21 1.23 2.54 9.42 6.14 -0.01 2.35
1.02 -3.21 0.12 9.32 5.24 6.90 3.29
-3.02 0.12 1.32 5.02 2.02 32.10 4.32
Tried these solutions:
df['corr'] = df[['trc_2017', 'trc_2016', 'trc_2015']].corr(df[['tsr_2017', 'tsr_2016', 'tsr_2015']])
solution above but i get error:
ValueError: The truth value of a DataFrame is ambiguous.
second solution:
df['corr'] = df[['trc_2017', 'trc_2016', 'trc_2015']].corrwith(df[['tsr_2017', 'tsr_2016', 'tsr_2015']])
gives all value of df['corr']
is NaN
Expected results:
trc_2017 trc_2016 trc_2015 tsr_2017 tsr_2016 tsr_2015 roe corr
0.21 1.23 2.54 9.42 6.14 -0.01 2.35 0.11
1.02 -3.21 0.12 9.32 5.24 6.90 3.29 3.21
-3.02 0.12 1.32 5.02 2.02 32.10 4.32 0.12
What is the best way using pandas to find the correlation for the column corr
?
Upvotes: 1
Views: 487
Reputation: 863501
Use corrwith
with rename
for same columns names in both subsets of DataFrame
s:
c1 = ['trc_2017', 'trc_2016', 'trc_2015']
c2 = ['tsr_2017', 'tsr_2016', 'tsr_2015']
df['corr'] = df[c1].corrwith(df[c2].rename(columns=dict(zip(c2, c1))), axis=1)
print (df)
trc_2017 trc_2016 trc_2015 tsr_2017 tsr_2016 tsr_2015 roe corr
0 0.21 1.23 2.54 9.42 6.14 -0.01 2.35 -0.994782
1 1.02 -3.21 0.12 9.32 5.24 6.90 3.29 0.910052
2 -3.02 0.12 1.32 5.02 2.02 32.10 4.32 0.647252
Upvotes: 2