Reputation: 2219
The first row of every group 'ID'
has an end_date
, and I am trying to identify any rows within the group, that have a begin_date
equal to the end_date
of the first row. I need to return the type
for the row that matches the dates. If there are multiple matches, the first is sufficient. If there are no matches then return 'non-existent'.
df
ID color begin_date end_date type
1 red 2017-01-01 2017-01-07 Professional
1 green 2017-01-05 2017-01-07 Aquatic
1 blue 2017-01-07 2017-01-15 Superhero
1 red 2017-01-11 2017-01-22 Chocolate
2 red 2017-02-22 2017-02-26 Professional
2 green 2017-02-26 2017-02-28 Aquatic
2 blue 2017-02-26 2017-02-28 Superhero
2 red 2017-02-27 2017-02-28 Chocolate
3 red 2017-03-11 2017-03-22 Chocolate
if df.groupby('ID')['begin_date'].first() == df.groupby('ID')['end_date'].any():
return df.groupby('ID')['end_date'].any().to_dict()
else:
return 'non-existent'
final df
ID type
1 Superhero
2 Aquatic
3 non-existant
Upvotes: 1
Views: 84
Reputation: 51335
Here is another way, using groupby()
, nth()
and reindex()
:
df.groupby('ID').apply(lambda x: x.loc[x.begin_date.eq(x.end_date.iloc[0]), 'type']).groupby('ID').nth(0).reindex(df['ID'].unique(),fill_value='non existant')
ID
1 Superhero
2 Aquatic
3 non existant
Upvotes: 1
Reputation: 323226
IIUC
df.groupby('ID').apply(lambda x : df.loc[x['begin_date'].isin(x['end_date'].iloc[[0]]).idxmax(),'type'] if x['begin_date'].isin(x['end_date'].iloc[[0]]).any() else 'non-existent')
Out[23]:
ID
1 Superhero
2 Aquatic
3 non-existent
dtype: object
Upvotes: 2