Fabio Lamanna
Fabio Lamanna

Reputation: 21552

pandas - how to filter "most frequent" Datetime objects

I'm working with a DataFrame like the following:

User_ID    Datetime
01    2014-01-01 08:00:00
01    2014-01-02 09:00:00
02    2014-01-02 10:00:00
02    2014-01-03 11:00:00
03    2014-01-04 12:00:00
04    2014-01-04 13:00:00
05    2014-01-02 14:00:00

I would like to filter Users under certain conditions based on the Datetime columns, e.g. filter only Users with one occurrence / month, or only Users with occurrences only in summer etc.

So far I've group the df with:

g = df.groupby(['User_ID','Datetime']).size()

obtaining the "traces" in time of each User:

User_ID    Datetime
01    2014-01-01 08:00:00
      2014-01-02 09:00:00
02    2014-01-02 10:00:00
      2014-01-03 11:00:00
03    2014-01-04 12:00:00
04    2014-01-04 13:00:00
05    2014-01-02 14:00:00

Then I applied a mask to filter, for instance, the Users with more than one trace:

mask = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
df = df[df['User_ID'].isin(mask[mask].index)]

So this is fine. I'm looking for a function instead of the lambda g: len(g)>1 able to filter Users under different conditions, as I said before. In particular filter Users with with one occurrence / month.

Upvotes: 1

Views: 677

Answers (1)

EdChum
EdChum

Reputation: 394071

So long as your 'Datetime' dtype is already a datetime and you are running pandas version 0.15.0 or higher then you can groupby the month in addition to the user id and then filter the results by checking the length of the group:

In [29]:

df.groupby(['User_ID',df['Datetime'].dt.month]).filter(lambda x: len(x) > 1)
Out[29]:
   User_ID            Datetime
0        1 2014-01-01 08:00:00
1        1 2014-01-02 09:00:00
2        2 2014-01-02 10:00:00
3        2 2014-01-03 11:00:00

Upvotes: 1

Related Questions