Reputation: 661
I have a pandas
dataframe called df1
and would like to filter the dataframe based on conditions in dataframe df2
, where for a specific grp_id
, I only want the dates starting from column year
in df2
up to the most recent year (2016) as shown in df3
. This is just a subset of my data in which I have at least 10 unique grp_id
to subset that have different start years.
df1
db_id cert_status grp_id year cap prov
130 IX-011 not-certified member SD 2004-01-01 30.0 KB
131 IX-011 not-certified member SD 2005-01-01 30.0 KB
132 IX-011 not-certified member SD 2006-01-01 30.0 KB
133 IX-011 not-certified member SD 2007-01-01 30.0 KB
134 IX-011 not-certified member SD 2008-01-01 30.0 KB
135 IX-011 not-certified member SD 2009-01-01 30.0 KB
136 IX-011 not-certified member SD 2010-01-01 30.0 KB
137 IX-011 not-certified member SD 2011-01-01 30.0 KB
138 IX-011 not-certified member SD 2012-01-01 30.0 KB
139 IX-011 not-certified member SD 2013-01-01 30.0 KB
140 IX-011 not-certified member SD 2014-01-01 30.0 KB
141 IX-011 not-certified member SD 2015-01-01 30.0 KB
142 IX-011 not-certified member SD 2016-01-01 30.0 KB
208 IX-017 not-certified member CG 2004-01-01 30.0 KB
209 IX-017 not-certified member CG 2005-01-01 30.0 KB
210 IX-017 not-certified member CG 2006-01-01 30.0 KB
211 IX-017 not-certified member CG 2007-01-01 30.0 KB
212 IX-017 not-certified member CG 2008-01-01 30.0 KB
213 IX-017 not-certified member CG 2009-01-01 30.0 KB
214 IX-017 not-certified member CG 2010-01-01 30.0 KB
215 IX-017 not-certified member CG 2011-01-01 30.0 KB
216 IX-017 not-certified member CG 2012-01-01 30.0 KB
217 IX-017 not-certified member CG 2013-01-01 80.0 KB
218 IX-017 not-certified member CG 2014-01-01 30.0 KB
219 IX-017 not-certified member CG 2015-01-01 30.0 KB
220 IX-017 not-certified member CG 2016-01-01 30.0 KB
df2
grp_id member year
4 SD Y 2007-01-01
6 CG Y 2011-01-01
df3
db_id cert_status grp_id year cap prov
133 IX-011 not-certified member SD 2007-01-01 30.0 KB
134 IX-011 not-certified member SD 2008-01-01 30.0 KB
135 IX-011 not-certified member SD 2009-01-01 30.0 KB
136 IX-011 not-certified member SD 2010-01-01 30.0 KB
137 IX-011 not-certified member SD 2011-01-01 30.0 KB
138 IX-011 not-certified member SD 2012-01-01 30.0 KB
139 IX-011 not-certified member SD 2013-01-01 30.0 KB
140 IX-011 not-certified member SD 2014-01-01 30.0 KB
141 IX-011 not-certified member SD 2015-01-01 30.0 KB
142 IX-011 not-certified member SD 2016-01-01 30.0 KB
215 IX-017 not-certified member CG 2011-01-01 30.0 KB
216 IX-017 not-certified member CG 2012-01-01 30.0 KB
217 IX-017 not-certified member CG 2013-01-01 80.0 KB
218 IX-017 not-certified member CG 2014-01-01 30.0 KB
219 IX-017 not-certified member CG 2015-01-01 30.0 KB
220 IX-017 not-certified member CG 2016-01-01 30.0 KB
What would be the easiest and quickest way to go about doing this?
Upvotes: 1
Views: 159
Reputation: 153550
Try using merge
with query
to filter:
df1.merge(df2, on = ['grp_id'], suffixes=('','_2'), right_index=True)\
.query('year >= year_2')[df1.columns]
Output:
db_id cert_status grp_id year cap prov
133 IX-011 not-certified member SD 2007-01-01 30.0 KB
134 IX-011 not-certified member SD 2008-01-01 30.0 KB
135 IX-011 not-certified member SD 2009-01-01 30.0 KB
136 IX-011 not-certified member SD 2010-01-01 30.0 KB
137 IX-011 not-certified member SD 2011-01-01 30.0 KB
138 IX-011 not-certified member SD 2012-01-01 30.0 KB
139 IX-011 not-certified member SD 2013-01-01 30.0 KB
140 IX-011 not-certified member SD 2014-01-01 30.0 KB
141 IX-011 not-certified member SD 2015-01-01 30.0 KB
142 IX-011 not-certified member SD 2016-01-01 30.0 KB
215 IX-017 not-certified member CG 2011-01-01 30.0 KB
216 IX-017 not-certified member CG 2012-01-01 30.0 KB
217 IX-017 not-certified member CG 2013-01-01 80.0 KB
218 IX-017 not-certified member CG 2014-01-01 30.0 KB
219 IX-017 not-certified member CG 2015-01-01 30.0 KB
220 IX-017 not-certified member CG 2016-01-01 30.0 KB
Upvotes: 2