Reputation: 23
I need to calculate Mann Kendall for a large hydrological dataset(maximum discharge values across 4381 sub-basins). There are 70 max values for each sub-basin. I need a significance level of 0.1 rather than the default 0.05.
Here is what my data looks like:
"sub" "max"
1 2.195
1 3.753
1 2.941
1 2.152
1 3.363
... ...
4381 0.532
4381 1.108
4381 0.977
4381 0.483
4381 0.435
And here's my script:
import pymannkendall as mk
import pandas as pd
from tqdm import tqdm
# naming input and output
in_fname = 'noyear.csv'
out_fname = 'mannkendall3.csv'
# reading csv file
print("reading from file...")
raw = pd.read_csv(in_fname, sep=';', header=0)
# naming columns and converting to strings
sub = raw['sub']
max = raw['max']
raw['sub'] = raw['sub'].astype(str)
# creating DataFrame
out_tbl = pd.DataFrame(data={'sub': sub, 'max': max})
# applying MK
df_mk=out_tbl.groupby(sub)['max'].agg(mk.original_test(alpha =0.1)).reset_index()
# creating csv with output
df_mk.to_csv(out_fname, index=False, sep=';')
When doing this, I get the following error:
Traceback (most recent call last):
File "/Users/user/Desktop/PyCharmProject/mann-kendall3.py", line 24, in <module>
df_mk=out_tbl.groupby(sub)['max'].agg(mk.original_test(alpha = 0.1)).reset_index()
TypeError: original_test() missing 1 required positional argument: 'x_old'
What is x_old, and where should it go in this case? I am a beginner so any tips would be appreciated!
Upvotes: 2
Views: 463
Reputation: 7250
I suppose that we are talking about pyMannKendall, version 1.4.2
There's a slight discrepancy in the doc-line of original_test
and the actual code. In the description returned by help(mk.original_test)
we can see x
as an input parameter:
Input:
x: a vector (list, numpy array or pandas series) data
alpha: significance level (0.05 default)
But the function signature is def original_test(x_old, alpha = 0.05)
(see the code at github). Here x_old
is the input vector, which is mentioned as x
in doc. This parameter is required and cannot be omitted when calling the function. Therefore the line where the Mann Kendall test is applied may need to be updated like this:
# applying MK
df_mk = out_tbl.groupby(sub)['max'].agg(lambda x: mk.original_test(x, alpha=0.1)).reset_index()
Or we can use partial
to produce a new function which takes a vector as a single parameter:
from functools import partial
df_mk = out_tbl.groupby(sub)['max'].agg(partial(mk.original_test, alpha=0.1)).reset_index()
Upvotes: 1