Name_is_Newton
Name_is_Newton

Reputation: 11

Kendall Tau for series/dataframes - Pandas (Python)

I have been trying to compute the Kendall's tau rank correlation coefficient for two series with the Pandas library (Python) using different methods. Surprisingly, the results were different using series/dataframe inputs, and even change with the concat order of the dataframe.

(links to the Pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.corr.html , https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html#pandas.DataFrame.corr)

To give an example, here are two sample series:

import pandas as pd

u1 = pd.Series([0.000000, 0.000000, 0.000000, 1.744147, 0.000000, 0.000000, 0.000000])
u2 = pd.Series([7.048640, 0.000000, 0.000000, 3.744840, 0.000000, 0.000000, 2.739336])

print('Method for series', u1.corr(u2, method='kendall'))
print('Method for dataframes #1', pd.concat([u1, u2], axis=1).corr(method='kendall'))
print('Method for dataframes #2', pd.concat([u2, u1], axis=1).corr(method='kendall'))

I really don't understand why the results of the correlation are different, given that the inputs are the same...

Any help would be greatly appreciated !!

Upvotes: 0

Views: 923

Answers (1)

Name_is_Newton
Name_is_Newton

Reputation: 11

Issue solved !!

There was a version issue with Pandas. An upgrade from Pandas 1.3.1 to Pandas 1.4.1 led to the obtention of a single coefficient: 0.421637.

Upvotes: 0

Related Questions