Fabio Lamanna
Fabio Lamanna

Reputation: 21542

pandas - drop rows under Datetime criteria

I'm working on a Dataframe df:

Datetime,User
2013-12-04 08:00:01,111
2013-12-04 09:00:02,111
2013-12-04 10:00:03,111
2013-12-04 09:00:04,112
2013-12-04 10:00:05,112
2013-12-04 11:00:06,112
2013-12-04 11:00:07,113
2013-12-04 11:00:08,113
2013-12-04 11:00:09,113
2013-12-04 13:00:10,114
2013-12-04 13:00:11,113
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115
2013-12-04 12:01:11,115

with User - Datetime information. I would like to drop Users under certain Datetime criteria, for instance when they are present more than, let's say, 3 or more times in the same minute of the same hour of the same day. Under this condition, Users 113 and 115 should be dropped out of the DataFrame. So far I tried to groupby the User column and to get information about the datatime object, but with no results.

Upvotes: 0

Views: 843

Answers (1)

Plug4
Plug4

Reputation: 3928

There is probably a nicer way to do this, but that's how I would do it:

import pandas as pd

# First set up the dataframe    
Datetime = ['2013-12-04 08:00:01',
            '2013-12-04 09:00:02',
            '2013-12-04 10:00:03',
            '2013-12-04 09:00:04',
            '2013-12-04 10:00:05',
            '2013-12-04 11:00:06',
            '2013-12-04 11:00:07',
            '2013-12-04 11:00:08',
            '2013-12-04 11:00:09',
            '2013-12-04 13:00:10',
            '2013-12-04 13:00:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11',
            '2013-12-04 12:01:11']

user = [111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 114, 113, 115, 115, 115,
        115, 115, 115]

Datetime = [pd.to_datetime(t) for t in Datetime]

df = pd.DataFrame(data={'user':user}, index=Datetime)
df['count_user'] = 1
df['hour'] = df.index.hour
df['min'] = df.index.minute
df['time'] = df.index
df = df.groupby(['hour', 'min', 'user', 'time']).sum()
df = df[df.count_user < 3]
df.reset_index(inplace=True)
df = df.set_index('time')
df.drop(['count_user', 'hour', 'min'], 1, inplace=True)
print df
                     user
time                     
2013-12-04 08:00:01   111
2013-12-04 09:00:02   111
2013-12-04 09:00:04   112
2013-12-04 10:00:03   111
2013-12-04 10:00:05   112
2013-12-04 11:00:06   112
2013-12-04 11:00:07   112
2013-12-04 11:00:08   113
2013-12-04 11:00:09   113
2013-12-04 12:01:11   113
2013-12-04 13:00:10   113
2013-12-04 13:00:11   114

Upvotes: 2

Related Questions