SHV_la
SHV_la

Reputation: 987

How to take the output of a function and put into pandas dataframe for further analysis?

I am using a function to find out how many rainfall events between magnitude thresholds result in flood events (denoted as 1 for flood, 0 for no flood). My data looks a bit like this:

      Date     rain      Flood
2010-01-01     0.2       0
2010-01-02     2.7       0   
2010-01-03     11.3      1 
2010-01-04     0.1       0
2010-01-05     5.3       1

Here is my function:

def Thresholds(df, thresholds):
    lower, upper = thresholds
    u_mask = df['rain'] < upper 
    l_mask = df['rain'] > lower
    between = df[l_mask & u_mask]
    return between['Flood'].value_counts()

my_intervals = np.array([(0.0,2.5),(2.5,5),(5,10),(10,15),(15,20),(20,30),(30,np.inf)])

dframes = [YEAR1,YEAR2,YEAR3]

for df in dframes:
    print('\n Dataframe')
    for interval in my_intervals:
        print(Thresholds(df,interval))

which works well, in that it provides me with counts of flood or not for each threshold, here's the output from a full dataframe:

Dataframe
0.0    304
1.0      5
Name: Flood, dtype: int64
0.0    7
Name: Flood, dtype: int64
0.0    5
Name: Flood, dtype: int64
0.0    2
1.0    2
Name: Flood, dtype: int64
Series([], Name: Flood, dtype: int64)
Series([], Name: Flood, dtype: int64)
Series([], Name: Flood, dtype: int64)

However the output is not in a form that I can do further calculations on. For example, I want to calculate the frequentist probability of floods occurring at each rainfall magnitude. To do this I anticipate needing to have the data in a dataframe, with columns for flood, or not, and rows titled by rainfall magnitude range (in this case labelled Threshold). So something like this:

Threshold     '1'     '0'
      2.5      0       2
      5.0      0       1
     10.0      1       0
     15.0      1       0

Any ideas out there? Thanks a lot in advance :)

Upvotes: 0

Views: 60

Answers (2)

BENY
BENY

Reputation: 323316

So you can store them into list then concat it back

l=[]
x=[]
for df in dframes:
    print('\n Dataframe')
    for interval in my_intervals:
        l.append(Thresholds(df,interval))
        x.append(interval[0])

pd.concat(l,axis=1,keys=x)

Update

d = []
for df in dframes:
    print('\n Dataframe')
    l = []
    x = []
    for interval in my_intervals:
        l.append(Thresholds(df,interval))
        x.append(interval[0])

    d.append(pd.concat(l,axis=1,keys=x))

pd.concat(d,keys=list(range(len(dframes))))

Upvotes: 1

gmds
gmds

Reputation: 19885

We can use groupby with pd.cut:

bins = np.append(my_intervals[:, 0], my_intervals[-1, 1])

result = df.groupby([pd.cut(df['rain'], bins), 'Flood']).size().unstack().fillna(0).astype(int)

print(result)

Output:

​Flood         0  1
rain              
(0.0, 2.5]    2  0
(2.5, 5.0]    1  0
(5.0, 10.0]   0  1
(10.0, 15.0]  0  1

Upvotes: 1

Related Questions