Reputation: 229
I have two dataframes for which I am trying to generate the Pearson Correlation Coefficient using values from each row of each dataframe.
I am currently using the code to generate a new dataframe with the Correlation Values:
Corr_df = df_A.corrwith(df_B, axis = 1)
However, the resultant Corr_df return only null values. When I can generate the same Correlation Coefficient in excel using the CORREL formula, why isn't Python doing the same?
The link to df_A is: https://drive.google.com/file/d/1gyBbH2MYQM_oM5wwLIkIoOrSADgooWIu/view?usp=sharing
The link to df_B is: https://drive.google.com/file/d/1lr60I-DLSaiSHVFRebXwxEH1J_ebbzoP/view?usp=sharing
Please help me out here!
Upvotes: 0
Views: 2276
Reputation: 83
corrwith works only with same column names... Otherwise it wont work... pd.corrwith on pandas dataframes with different column names this link would help a bit
Upvotes: 1
Reputation: 862611
You need same columns names in both DataFrames:
df_A = pd.read_excel('A.xlsx')
df_B = pd.read_excel('A.xlsx')
df_B.columns = df_A.columns
Corr_df = df_A.corrwith(df_B, axis = 1)
Alternative:
d = dict(zip(df_A.columns, df_B.columns))
Corr_df = df_A.corrwith(df_B.rename(columns=d), axis = 1)
print (Corr_df.head())
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
Upvotes: 2