Rajesh
Rajesh

Reputation: 786

Check if date is within other dates in the same group

I have a dataframe with some IDs and a few start dates and end dates corresponding to that id, that looks like,

df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
             'start_date': ['2016-07-27 16:07:00','2016-10-20 08:10:00','2016-12-08 10:12:00','2017-07-16 11:07:00','2017-07-16 16:07:00','2016-07-27 16:07:00','2016-10-20 08:10:00','2016-12-08 10:12:00','2017-07-16 11:07:00','2017-07-16 16:07:00'],
             'end_date': ['2016-07-29 15:07:00','2017-08-10 07:04:00','2017-03-07 12:03:00','2017-07-18 11:07:00','2017-09-20 12:09:00','2016-07-29 15:07:00','2017-08-10 07:04:00','2017-03-07 12:03:00','2017-07-18 11:07:00','2017-09-20 12:09:00']})

id start_date          end_date
1  2016-07-27 16:07:00 2016-07-29 15:07:00
1  2016-10-20 08:10:00 2017-08-10 07:04:00
1  2016-12-08 10:12:00 2017-03-07 12:03:00
1  2017-07-16 11:07:00 2017-07-18 11:07:00
1  2017-07-16 16:07:00 2017-09-20 12:09:00
2  2016-07-27 16:07:00 2016-07-29 15:07:00
2  2016-10-20 08:10:00 2017-08-10 07:04:00
2  2016-12-08 10:12:00 2017-03-07 12:03:00
2  2017-07-16 11:07:00 2017-07-18 11:07:00
2  2017-07-16 16:07:00 2017-09-20 12:09:00

I want to see if the start_date of a row is between any another start and end date for that same id and with how many records the start_date is in between the start and end date. The resulting output should look like,

id start_date          end_date             count_col
1  2016-07-27 16:07:00 2016-07-29 15:07:00  0
1  2016-10-20 08:10:00 2017-08-10 07:04:00  0
1  2016-12-08 10:12:00 2017-03-07 12:03:00  1
1  2017-07-16 11:07:00 2017-07-18 11:07:00  1
1  2017-07-16 16:07:00 2017-09-20 12:09:00  2
2  2016-07-27 16:07:00 2016-07-29 15:07:00  0
2  2016-10-20 08:10:00 2017-08-10 07:04:00  0
2  2016-12-08 10:12:00 2017-03-07 12:03:00  1
2  2017-07-16 11:07:00 2017-07-18 11:07:00  1
2  2017-07-16 16:07:00 2017-09-20 12:09:00  2

I tried something like,

def start_date_compare(start_date):
    date_within = df.apply(lambda x: ((x['start_date']<=start_date) & (x['end_date']>=start_date)), axis=1)
    return date_within.sum(axis=0)

df['count_col'] = df.groupby(['id'])['start_date'].apply(lambda x: start_date_compare(x))

But this checks a row with itself too, and also doesn't compare with the same id alone.

Upvotes: 2

Views: 208

Answers (1)

jezrael
jezrael

Reputation: 862441

There is always added original row, so only subtract 1, also lambda function here is not necessary:

EDIT:

For test values per groups use:

df.start_date = pd.to_datetime(df.start_date)
df.end_date = pd.to_datetime(df.end_date)

def start_date_compare(subdf):
    date_within = subdf.apply(lambda x: ((x['start_date']<=subdf['start_date']) &
                                         (x['end_date']>=subdf['start_date'])), axis=1)
    subdf['count_col'] = date_within.sum(axis=0) - 1
    return subdf

df = df.groupby('id').apply(start_date_compare)
print (df)
   id          start_date            end_date  count_col
0   1 2016-07-27 16:07:00 2016-07-29 15:07:00          0
1   1 2016-10-20 08:10:00 2017-08-10 07:04:00          0
2   1 2016-12-08 10:12:00 2017-03-07 12:03:00          1
3   1 2017-07-16 11:07:00 2017-07-18 11:07:00          1
4   1 2017-07-16 16:07:00 2017-09-20 12:09:00          2
5   2 2016-07-27 16:07:00 2016-07-29 15:07:00          0
6   2 2016-10-20 08:10:00 2017-08-10 07:04:00          0
7   2 2016-12-08 10:12:00 2017-03-07 12:03:00          1
8   2 2017-07-16 11:07:00 2017-07-18 11:07:00          1
9   2 2017-07-16 16:07:00 2017-09-20 12:09:00          2
    

Upvotes: 3

Related Questions