00robinette
00robinette

Reputation: 567

Chain pandas conditions together

I am filtering a pandas dataframe so that it gives me a subset of data based on a date range.

leads_df = leads_df[(leads_df['year'] >= start_date_year) & (leads_df['year'] <= end_date_year)]

I would like to pass in an additional condition on the second part so that it returns all items in the end_date_year that occurred before a given quarter as such:

leads_df = leads_df[(leads_df['year'] >= start_date_year) & ((leads_df['year'] <= end_date_year) & (leads_df['quarter'] <= quarter))]

Sadly the above approach excludes all quarters less than or equal to quarter. Here is my question. Let's suppose quarter is 2. How do I get quarters 1 and 2 in end_date_year?

Upvotes: 0

Views: 481

Answers (2)

ALollz
ALollz

Reputation: 59579

Create a date column based on the year and quarter. Pandas can do this if you create the '{year}Q{quarter}' string. This simplifies the subsetting logic and makes the code much more legible.

import pandas as pd
df = pd.DataFrame({'year': [2010]*4+[2011]*4,
                   'quarter': [1,2,3,4]*2})

df['date'] = pd.to_datetime(df['year'].astype(str)+'Q'+df['quarter'].astype(str))

# pandas so smart, can compare Timestamp with strings :D
df[df.date.between('2010', '2011Q2')]

   year  quarter       date
0  2010        1 2010-01-01
1  2010        2 2010-04-01
2  2010        3 2010-07-01
3  2010        4 2010-10-01
4  2011        1 2011-01-01
5  2011        2 2011-04-01

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150825

Try:

before_this_year = (leads_df['year'] >= start_date_year) & (leads_df['year'] < end_date_year) 

this_year = (leads_df['year'] ==end_date_year) & (leads_df['quarter'] <= quarter)

leads_df = leads_df[before_this_year | this_year]

Upvotes: 2

Related Questions