Ton
Ton

Reputation: 81

How to determine num entries per minute in pandas data

I am trying to get a breakdown of how many times an entry in the 'f' column is listed per minute.

    import pandas as pd
    import datetime as dt

    f= ['f0001',
        'f0001',
        'f0001',
        'f0001',
        'f0020',
        'f0008',
        'f0001',
        'f0005',
        'f3203',
        'f0002',
        'f0002',
        'f0001',
        'f0201',
        'f0001',
        'f0439',
        'f0233',
        'f0008',
        'f0003',
        'f0009',
        'f0005']


    dates = ['20130101100103', '20130101100110',
               '20130101100125', '20130101100133',
               '20130101100100', '20130101100200',
               '20130101100200', '20130101100200',
               '20130101100200', '20130101100200',
               '20130101100200', '20130101100300',
               '20130101100300', '20130101100300',
               '20130101100300', '20130101100400',
               '20130101100400', '20130101100400',
               '20130101100400', '20130101100400']


    d = {'date':  dates}
    data = pd.DataFrame(d)
    data['user'] = f

    data.date = data.date.apply(str)
    data.date = data.date.apply(lambda x: 
    dt.datetime.strptime(x,'%Y%m%d%H%M%S'))

    s = data.groupby([data.date.map(lambda t: t.minute)]).count()

But so far all I've got is the below

s

    date    user
date        
1   5   5
2   6   6
3   4   4
4   5   5

Upvotes: 3

Views: 1710

Answers (2)

harpan
harpan

Reputation: 8631

You were almost there. You just need to add data['user'] into your groupby clause as well.

data.groupby([[data.date.dt.minute, data['user']]).count().rename(columns={'date':'count'}).reset_index()

Output:

date    user    count
0   1   f0001   4
1   1   f0020   1
2   2   f0001   1
3   2   f0002   2
4   2   f0005   1
5   2   f0008   1
6   2   f3203   1
7   3   f0001   2
8   3   f0201   1
9   3   f0439   1
10  4   f0003   1
11  4   f0005   1
12  4   f0008   1
13  4   f0009   1
14  4   f0233   1

Upvotes: 5

sacuL
sacuL

Reputation: 51335

You can use pd.Grouper:

data.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()

Full example, including how to create your dataframe in a much easier way, using pandas functions (I know that this wasn't the topic of your question, but I think the following makes it much easier rather than importing datetime):

df = pd.DataFrame({'date':dates, 'user':f})

df['date'] = pd.to_datetime(df['date'], format='%Y%m%d%H%M%S')

df.groupby(pd.Grouper(key='date', freq='min'))['user'].value_counts()

The result:

date                 user 
2013-01-01 10:01:00  f0001    4
                     f0020    1
2013-01-01 10:02:00  f0002    2
                     f0001    1
                     f0005    1
                     f0008    1
                     f3203    1
2013-01-01 10:03:00  f0001    2
                     f0201    1
                     f0439    1
2013-01-01 10:04:00  f0003    1
                     f0005    1
                     f0008    1
                     f0009    1
                     f0233    1
Name: user, dtype: int64

Upvotes: 2

Related Questions