Boolean Check in a Pandas DataFrame based on Criteria at different Index values

Question

I would like to calculate the number of instances two criteria are fulfilled in a Pandas DataFrame at a different index value. A snipped of the DataFrame is:

                GDP  USRECQ
DATE                        
1947-01-01       NaN       0
1947-04-01       NaN       0
1947-07-01       NaN       0
1947-10-01       NaN       0
1948-01-01  0.095023       0
1948-04-01  0.107998       0
1948-07-01  0.117553       0
1948-10-01  0.078371       0
1949-01-01  0.034560       1
1949-04-01 -0.004397       1

I would like to count the number of observation for which USRECQ[DATE+1]==1 and GDP[DATE]>a if GDP[DATE]!='NAN'.

By referring to DATE+1 and DATE I mean that the value of USRECQ should be check at the subsequent date for which the value of GDP is examined. Unfortunately, I do not know how to address the deal with the different time indices in my selection. Can someone kindly advise me on how to count the number of instances properly?

Alex Riley · Accepted Answer

One may of achieving this is to create a new column to show what the next value of 'USRECQ' is:

>>> df['USRECQ NEXT'] = df['USRECQ'].shift(-1)
>>> df
         DATE       GDP  USRECQ  USRECQ NEXT
0  1947-01-01       NaN       0            0
1  1947-04-01       NaN       0            0
2  1947-07-01       NaN       0            0
3  1947-10-01       NaN       0            0
4  1948-01-01  0.095023       0            0
5  1948-04-01  0.107998       0            0
6  1948-07-01  0.117553       0            0
7  1948-10-01  0.078371       0            1
8  1949-01-01  0.034560       1            1
9  1949-04-01 -0.004397       1          NaN

Then you could filter your DataFrame according to your requirements as follows:

>>> a = 0.01
>>> df[(df['USRECQ NEXT'] == 1) & (df['GDP'] > a) & (pd.notnull(df['GDP']))]

         DATE       GDP  USRECQ  USRECQ NEXT
7  1948-10-01  0.078371       0            1
8  1949-01-01  0.034560       1            1

To count the number of rows in a DataFrame, you can just use the built-in function len.

Boolean Check in a Pandas DataFrame based on Criteria at different Index values

Answers (2)

Related Questions