Josh Kidd
Josh Kidd

Reputation: 870

Pandas Groupby Date to work out frequency

My input is a two column dataframe like this:

Device, Date
123414, 2016-11-01
123456, 2016-10-25
123414, 2016-10-29
...

What I want to create is a dataframe with three columns, breaking down how many times we had each device log on once a day, twice, ect. As an example it should look something like:

Date,       freq, count
2016-10-01  01    296
2016-10-01  02    137
2016-10-01  03    30
2016-10-02  01    301
2016-10-02  02    201
...

I hope that's clear enough. The code I've written is:

df['Freq'] = 1
groupeddf = df.groupby(['Date','Device']).count()
finaldf = groupeddf.groupby(['Date','Freq']).count()

However it pops up with KeyError: 'Date'
does anyone have any suggestions on alternate approaches, or where I might be going wrong

Upvotes: 2

Views: 760

Answers (2)

Ted Petrou
Ted Petrou

Reputation: 61997

Perhaps there is some misunderstanding but I am reading the question that you would like frequency of the frequency of each device for each day. This looks to be a different interpretation of @jezrael. In that case you can call value_counts twice to get what you want

Using slightly different data than @jezrael you can do the following

print (df)
   Device        Date
0  123414  2016-11-01
1  123456  2016-11-01
2  123456  2016-10-25
3  123414  2016-10-25
4  123414  2016-10-25
5  123414  2016-10-29
6  123414  2016-10-29
7  123414  2016-10-29
8  123456  2016-10-29

df1 = df.groupby('Date')['Device'].apply(lambda x: pd.value_counts(x).value_counts()).reset_index()
df1.columns = ['Date', 'freq', 'count']
df1.sort_values(['Date', 'freq'])

With output

         Date  freq  count
1  2016-10-25     1      1
0  2016-10-25     2      1
3  2016-10-29     1      1
2  2016-10-29     3      1
4  2016-11-01     1      2

Upvotes: 0

jezrael
jezrael

Reputation: 863166

I think you can use transform by size and then aggregate size by new column freq:

print (df)
   Device       Date
0  123414 2016-11-01
1  123456 2016-11-01
2  123456 2016-10-25
3  123414 2016-10-29
4  123414 2016-10-29
5  123414 2016-10-29

df['freq'] = df.groupby(['Device','Date'])['Device'].transform('size')
print (df)
   Device       Date  freq
0  123414 2016-11-01     1
1  123456 2016-11-01     1
2  123456 2016-10-25     1
3  123414 2016-10-29     3
4  123414 2016-10-29     3
5  123414 2016-10-29     3

df = df.groupby(['Date','freq']).size().reset_index(name='count')
print (df)
        Date  freq  count
0 2016-10-25     1      1
1 2016-10-29     3      3
2 2016-11-01     1      2

Also if need all combination freq add unstack and stack:

df = df.groupby(['Date','freq'])
       .size()
       .unstack(fill_value=0)
       .stack()
       .reset_index(name='count')

print (df)
        Date  freq  count
0 2016-10-25     1      1
1 2016-10-25     3      0
2 2016-10-29     1      0
3 2016-10-29     3      3
4 2016-11-01     1      2
5 2016-11-01     3      0

Upvotes: 1

Related Questions