Reputation: 11
I have been trying to compute the Kendall's tau rank correlation coefficient for two series with the Pandas library (Python) using different methods. Surprisingly, the results were different using series/dataframe inputs, and even change with the concat order of the dataframe.
(links to the Pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.corr.html , https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html#pandas.DataFrame.corr)
To give an example, here are two sample series:
import pandas as pd
u1 = pd.Series([0.000000, 0.000000, 0.000000, 1.744147, 0.000000, 0.000000, 0.000000])
u2 = pd.Series([7.048640, 0.000000, 0.000000, 3.744840, 0.000000, 0.000000, 2.739336])
print('Method for series', u1.corr(u2, method='kendall'))
print('Method for dataframes #1', pd.concat([u1, u2], axis=1).corr(method='kendall'))
print('Method for dataframes #2', pd.concat([u2, u1], axis=1).corr(method='kendall'))
I really don't understand why the results of the correlation are different, given that the inputs are the same...
Any help would be greatly appreciated !!
Upvotes: 0
Views: 923
Reputation: 11
Issue solved !!
There was a version issue with Pandas. An upgrade from Pandas 1.3.1 to Pandas 1.4.1 led to the obtention of a single coefficient: 0.421637.
Upvotes: 0