Stefano Potter
Stefano Potter

Reputation: 3577

Calculating Kendall's tau using scipy and groupby

I have a csv file with precipitation data per year and per weather station. It looks like this:

station_id    year       Sum
 210018      1916      65.024
 210018      1917      35.941
 210018      1918      28.448
 210018      1919      68.58
 210018      1920      31.115
 215400      1916      44.958
 215400      1917      31.496
 215400      1918      38.989
 215400      1919      74.93
 215400      1920      53.5432

I want to return a Kendall's tau correlation and p-value based upon unique station id's. So for above I want the correlation between sum and year for station id 210018 and 215400.

The correlation for station_id 210018 would then be -.20 and a p-value of .62 and for station_id 215400 correlation would be .40 and a p-value of .33.

I am trying to use this:

grouped=df.groupby(['station_id'])
grouped.aggregate([tau, p_value=sp.stats.kendalltau(df.year, df.Sum)])

The error returned is a syntax error on the equal sign after p_value.

Any help would be appreciated.

Upvotes: 3

Views: 3376

Answers (1)

Alex Riley
Alex Riley

Reputation: 176978

One way to calculate this is to use apply on the groupby object:

>>> import scipy.stats as st
>>> df.groupby(['station_id']).apply(lambda x: st.kendalltau(x['year'], x['Sum']))
station_id
210018        (-0.2, 0.62420612399)
215400        (0.4, 0.327186890661)
dtype: object

Upvotes: 10

Related Questions