Reputation: 567
I am filtering a pandas dataframe so that it gives me a subset of data based on a date range.
leads_df = leads_df[(leads_df['year'] >= start_date_year) & (leads_df['year'] <= end_date_year)]
I would like to pass in an additional condition on the second part so that it returns all items in the end_date_year
that occurred before a given quarter as such:
leads_df = leads_df[(leads_df['year'] >= start_date_year) & ((leads_df['year'] <= end_date_year) & (leads_df['quarter'] <= quarter))]
Sadly the above approach excludes all quarters less than or equal to quarter. Here is my question. Let's suppose quarter is 2. How do I get quarters 1 and 2 in end_date_year
?
Upvotes: 0
Views: 481
Reputation: 59579
Create a date
column based on the year and quarter. Pandas can do this if you create the '{year}Q{quarter}'
string. This simplifies the subsetting logic and makes the code much more legible.
import pandas as pd
df = pd.DataFrame({'year': [2010]*4+[2011]*4,
'quarter': [1,2,3,4]*2})
df['date'] = pd.to_datetime(df['year'].astype(str)+'Q'+df['quarter'].astype(str))
# pandas so smart, can compare Timestamp with strings :D
df[df.date.between('2010', '2011Q2')]
year quarter date
0 2010 1 2010-01-01
1 2010 2 2010-04-01
2 2010 3 2010-07-01
3 2010 4 2010-10-01
4 2011 1 2011-01-01
5 2011 2 2011-04-01
Upvotes: 3
Reputation: 150825
Try:
before_this_year = (leads_df['year'] >= start_date_year) & (leads_df['year'] < end_date_year)
this_year = (leads_df['year'] ==end_date_year) & (leads_df['quarter'] <= quarter)
leads_df = leads_df[before_this_year | this_year]
Upvotes: 2