Reputation: 81
I am trying to get a breakdown of how many times an entry in the 'f' column is listed per minute.
import pandas as pd import datetime as dt f= ['f0001', 'f0001', 'f0001', 'f0001', 'f0020', 'f0008', 'f0001', 'f0005', 'f3203', 'f0002', 'f0002', 'f0001', 'f0201', 'f0001', 'f0439', 'f0233', 'f0008', 'f0003', 'f0009', 'f0005'] dates = ['20130101100103', '20130101100110', '20130101100125', '20130101100133', '20130101100100', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100200', '20130101100300', '20130101100300', '20130101100300', '20130101100300', '20130101100400', '20130101100400', '20130101100400', '20130101100400', '20130101100400'] d = {'date': dates} data = pd.DataFrame(d) data['user'] = f data.date = data.date.apply(str) data.date = data.date.apply(lambda x: dt.datetime.strptime(x,'%Y%m%d%H%M%S')) s = data.groupby([data.date.map(lambda t: t.minute)]).count()
But so far all I've got is the below
s date user date 1 5 5 2 6 6 3 4 4 4 5 5
Upvotes: 3
Views: 1710
Reputation: 8631
You were almost there. You just need to add data['user']
into your groupby clause as well.
data.groupby([[data.date.dt.minute, data['user']]).count().rename(columns={'date':'count'}).reset_index()
Output:
date user count
0 1 f0001 4
1 1 f0020 1
2 2 f0001 1
3 2 f0002 2
4 2 f0005 1
5 2 f0008 1
6 2 f3203 1
7 3 f0001 2
8 3 f0201 1
9 3 f0439 1
10 4 f0003 1
11 4 f0005 1
12 4 f0008 1
13 4 f0009 1
14 4 f0233 1
Upvotes: 5
Reputation: 51335
You can use pd.Grouper
:
data.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()
Full example, including how to create your dataframe in a much easier way, using pandas
functions (I know that this wasn't the topic of your question, but I think the following makes it much easier rather than importing datetime
):
df = pd.DataFrame({'date':dates, 'user':f})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d%H%M%S')
df.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()
The result:
date user
2013-01-01 10:01:00 f0001 4
f0020 1
2013-01-01 10:02:00 f0002 2
f0001 1
f0005 1
f0008 1
f3203 1
2013-01-01 10:03:00 f0001 2
f0201 1
f0439 1
2013-01-01 10:04:00 f0003 1
f0005 1
f0008 1
f0009 1
f0233 1
Name: user, dtype: int64
Upvotes: 2