Reputation: 987
I am using a function to find out how many rainfall events between magnitude thresholds result in flood events (denoted as 1 for flood, 0 for no flood). My data looks a bit like this:
Date rain Flood
2010-01-01 0.2 0
2010-01-02 2.7 0
2010-01-03 11.3 1
2010-01-04 0.1 0
2010-01-05 5.3 1
Here is my function:
def Thresholds(df, thresholds):
lower, upper = thresholds
u_mask = df['rain'] < upper
l_mask = df['rain'] > lower
between = df[l_mask & u_mask]
return between['Flood'].value_counts()
my_intervals = np.array([(0.0,2.5),(2.5,5),(5,10),(10,15),(15,20),(20,30),(30,np.inf)])
dframes = [YEAR1,YEAR2,YEAR3]
for df in dframes:
print('\n Dataframe')
for interval in my_intervals:
print(Thresholds(df,interval))
which works well, in that it provides me with counts of flood or not for each threshold, here's the output from a full dataframe:
Dataframe
0.0 304
1.0 5
Name: Flood, dtype: int64
0.0 7
Name: Flood, dtype: int64
0.0 5
Name: Flood, dtype: int64
0.0 2
1.0 2
Name: Flood, dtype: int64
Series([], Name: Flood, dtype: int64)
Series([], Name: Flood, dtype: int64)
Series([], Name: Flood, dtype: int64)
However the output is not in a form that I can do further calculations on. For example, I want to calculate the frequentist probability of floods occurring at each rainfall magnitude. To do this I anticipate needing to have the data in a dataframe, with columns for flood, or not, and rows titled by rainfall magnitude range (in this case labelled Threshold). So something like this:
Threshold '1' '0'
2.5 0 2
5.0 0 1
10.0 1 0
15.0 1 0
Any ideas out there? Thanks a lot in advance :)
Upvotes: 0
Views: 60
Reputation: 323316
So you can store them into list
then concat
it back
l=[]
x=[]
for df in dframes:
print('\n Dataframe')
for interval in my_intervals:
l.append(Thresholds(df,interval))
x.append(interval[0])
pd.concat(l,axis=1,keys=x)
Update
d = []
for df in dframes:
print('\n Dataframe')
l = []
x = []
for interval in my_intervals:
l.append(Thresholds(df,interval))
x.append(interval[0])
d.append(pd.concat(l,axis=1,keys=x))
pd.concat(d,keys=list(range(len(dframes))))
Upvotes: 1
Reputation: 19885
We can use groupby
with pd.cut
:
bins = np.append(my_intervals[:, 0], my_intervals[-1, 1])
result = df.groupby([pd.cut(df['rain'], bins), 'Flood']).size().unstack().fillna(0).astype(int)
print(result)
Output:
Flood 0 1
rain
(0.0, 2.5] 2 0
(2.5, 5.0] 1 0
(5.0, 10.0] 0 1
(10.0, 15.0] 0 1
Upvotes: 1