Gautham Kanthasamy
Gautham Kanthasamy

Reputation: 229

Corrwith returns null values

I have two dataframes for which I am trying to generate the Pearson Correlation Coefficient using values from each row of each dataframe.

I am currently using the code to generate a new dataframe with the Correlation Values:

Corr_df = df_A.corrwith(df_B, axis = 1)

However, the resultant Corr_df return only null values. When I can generate the same Correlation Coefficient in excel using the CORREL formula, why isn't Python doing the same?

The link to df_A is: https://drive.google.com/file/d/1gyBbH2MYQM_oM5wwLIkIoOrSADgooWIu/view?usp=sharing

The link to df_B is: https://drive.google.com/file/d/1lr60I-DLSaiSHVFRebXwxEH1J_ebbzoP/view?usp=sharing

Please help me out here!

Upvotes: 0

Views: 2276

Answers (2)

Pulkit Kedia
Pulkit Kedia

Reputation: 83

corrwith works only with same column names... Otherwise it wont work... pd.corrwith on pandas dataframes with different column names this link would help a bit

Upvotes: 1

jezrael
jezrael

Reputation: 862611

You need same columns names in both DataFrames:

df_A = pd.read_excel('A.xlsx')
df_B = pd.read_excel('A.xlsx')

df_B.columns = df_A.columns
Corr_df = df_A.corrwith(df_B, axis = 1)

Alternative:

d = dict(zip(df_A.columns, df_B.columns))
Corr_df = df_A.corrwith(df_B.rename(columns=d), axis = 1)

print (Corr_df.head())
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64

Upvotes: 2

Related Questions