fabian
fabian

Reputation: 1881

Boolean Check in a Pandas DataFrame based on Criteria at different Index values

I would like to calculate the number of instances two criteria are fulfilled in a Pandas DataFrame at a different index value. A snipped of the DataFrame is:

                GDP  USRECQ
DATE                        
1947-01-01       NaN       0
1947-04-01       NaN       0
1947-07-01       NaN       0
1947-10-01       NaN       0
1948-01-01  0.095023       0
1948-04-01  0.107998       0
1948-07-01  0.117553       0
1948-10-01  0.078371       0
1949-01-01  0.034560       1
1949-04-01 -0.004397       1

I would like to count the number of observation for which USRECQ[DATE+1]==1 and GDP[DATE]>a if GDP[DATE]!='NAN'.

By referring to DATE+1 and DATE I mean that the value of USRECQ should be check at the subsequent date for which the value of GDP is examined. Unfortunately, I do not know how to address the deal with the different time indices in my selection. Can someone kindly advise me on how to count the number of instances properly?

Upvotes: 0

Views: 91

Answers (2)

Alex Riley
Alex Riley

Reputation: 176978

One may of achieving this is to create a new column to show what the next value of 'USRECQ' is:

>>> df['USRECQ NEXT'] = df['USRECQ'].shift(-1)
>>> df
         DATE       GDP  USRECQ  USRECQ NEXT
0  1947-01-01       NaN       0            0
1  1947-04-01       NaN       0            0
2  1947-07-01       NaN       0            0
3  1947-10-01       NaN       0            0
4  1948-01-01  0.095023       0            0
5  1948-04-01  0.107998       0            0
6  1948-07-01  0.117553       0            0
7  1948-10-01  0.078371       0            1
8  1949-01-01  0.034560       1            1
9  1949-04-01 -0.004397       1          NaN

Then you could filter your DataFrame according to your requirements as follows:

>>> a = 0.01
>>> df[(df['USRECQ NEXT'] == 1) & (df['GDP'] > a) & (pd.notnull(df['GDP']))]

         DATE       GDP  USRECQ  USRECQ NEXT
7  1948-10-01  0.078371       0            1
8  1949-01-01  0.034560       1            1

To count the number of rows in a DataFrame, you can just use the built-in function len.

Upvotes: 2

yatinla
yatinla

Reputation: 96

I think the DataFrame.shift method is the key to what you seek in terms of looking at the next index.

And Numpy's logical expressions can come in really handy for these sorts of things.

So if df is your dataframe then I think what you're looking for is something like

count = df[np.logical_and(df.shift(-1)['USRECQ'] == 1,df.GDP > -0.1)]

The example I used to test this is on github.

Upvotes: 1

Related Questions