Reputation: 21552
I'm working with a DataFrame like the following:
User_ID Datetime
01 2014-01-01 08:00:00
01 2014-01-02 09:00:00
02 2014-01-02 10:00:00
02 2014-01-03 11:00:00
03 2014-01-04 12:00:00
04 2014-01-04 13:00:00
05 2014-01-02 14:00:00
I would like to filter Users under certain conditions based on the Datetime columns, e.g. filter only Users with one occurrence / month, or only Users with occurrences only in summer etc.
So far I've group the df with:
g = df.groupby(['User_ID','Datetime']).size()
obtaining the "traces" in time of each User:
User_ID Datetime
01 2014-01-01 08:00:00
2014-01-02 09:00:00
02 2014-01-02 10:00:00
2014-01-03 11:00:00
03 2014-01-04 12:00:00
04 2014-01-04 13:00:00
05 2014-01-02 14:00:00
Then I applied a mask to filter, for instance, the Users with more than one trace:
mask = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
df = df[df['User_ID'].isin(mask[mask].index)]
So this is fine. I'm looking for a function instead of the lambda g: len(g)>1
able to filter Users under different conditions, as I said before. In particular filter Users with with one occurrence / month.
Upvotes: 1
Views: 677
Reputation: 394071
So long as your 'Datetime' dtype is already a datetime and you are running pandas version 0.15.0 or higher then you can groupby the month in addition to the user id and then filter the results by checking the length of the group:
In [29]:
df.groupby(['User_ID',df['Datetime'].dt.month]).filter(lambda x: len(x) > 1)
Out[29]:
User_ID Datetime
0 1 2014-01-01 08:00:00
1 1 2014-01-02 09:00:00
2 2 2014-01-02 10:00:00
3 2 2014-01-03 11:00:00
Upvotes: 1