gbox
gbox

Reputation: 829

Removing Rows With The Same Id Based On Overlapping Time

I have the following dataframe

df = pd.DataFrame({'user':['Dan','Dan','Dan','Dan','Ron'], 'start':['2020-01-01 17:00:00',
'2020-01-01 16:20:00','2020-01-01 17:00:00', '2020-01-01 06:30:00','2020-01-01 17:00:00'],
'end':['2020-01-01 21:00:00', '2020-01-02 01:00:00','2020-01-01 21:15:00',
'2020-01-01 10:00:00','2020-01-01 21:00:00']})
user start end
Dan 2020-01-01 17:00:00 2020-01-01 21:00:00
Dan 2020-01-01 16:20:00 2020-01-02 01:00:00
Dan 2020-01-01 17:00:00 2020-01-01 21:15:00
Dan 2020-01-01 06:30:00 2020-01-01 10:00:00
Ron 2020-01-01 17:00:00 2020-01-01 21:00:00

For the same user, I would like to leave just one of the records which have overlapping start-end time intervals (no matter which) for example:

user start end
Dan 2020-01-01 17:00:00 2020-01-01 21:00:00
Dan 2020-01-01 06:30:00 2020-01-01 10:00:00
Ron 2020-01-01 17:00:00 2020-01-01 21:00:00

My try, for a given user to run:

idx, single_user, single_start, single_end = df.to_records()[0]
a = df.loc[((df['user'] == single_user) & ((df['start'] < single_start) & (df['end']< single_end)) 
|((df['start']> single_start) &(df['end'] > single_end)))].index.tolist()
a.append(df.iloc[idx].name)

And obtain the result using:

test_df.iloc[a]

There must be a better way, is there a pandas method to tackle it?

Upvotes: 1

Views: 59

Answers (1)

mozway
mozway

Reputation: 261840

Here is a potential approach:

# ensure datetime type for meaningful comparisons
df['end'] = pd.to_datetime(df['end'])
df['start'] = pd.to_datetime(df['start'])

# sort by start time
df = df.sort_values(by='start')

# valid data have a start time greater than the previous end time
# or no previous time (NaT in the shift)
s = df.groupby('user')['end'].shift()
df[df['start'].gt(s)|s.isna()]

Output:

  user               start                 end
3  Dan 2020-01-01 06:30:00 2020-01-01 10:00:00
1  Dan 2020-01-01 16:20:00 2020-01-02 01:00:00
4  Ron 2020-01-01 17:00:00 2020-01-01 21:00:00

Upvotes: 1

Related Questions