Reputation: 3577
I have a csv file with precipitation data per year and per weather station. It looks like this:
station_id year Sum
210018 1916 65.024
210018 1917 35.941
210018 1918 28.448
210018 1919 68.58
210018 1920 31.115
215400 1916 44.958
215400 1917 31.496
215400 1918 38.989
215400 1919 74.93
215400 1920 53.5432
I want to return a Kendall's tau correlation and p-value based upon unique station id's. So for above I want the correlation between sum and year for station id 210018 and 215400.
The correlation for station_id 210018 would then be -.20 and a p-value of .62 and for station_id 215400 correlation would be .40 and a p-value of .33.
I am trying to use this:
grouped=df.groupby(['station_id'])
grouped.aggregate([tau, p_value=sp.stats.kendalltau(df.year, df.Sum)])
The error returned is a syntax error on the equal sign after p_value.
Any help would be appreciated.
Upvotes: 3
Views: 3376
Reputation: 176978
One way to calculate this is to use apply
on the groupby
object:
>>> import scipy.stats as st
>>> df.groupby(['station_id']).apply(lambda x: st.kendalltau(x['year'], x['Sum']))
station_id
210018 (-0.2, 0.62420612399)
215400 (0.4, 0.327186890661)
dtype: object
Upvotes: 10