Reputation: 870
My input is a two column dataframe like this:
Device, Date
123414, 2016-11-01
123456, 2016-10-25
123414, 2016-10-29
...
What I want to create is a dataframe with three columns, breaking down how many times we had each device log on once a day, twice, ect. As an example it should look something like:
Date, freq, count
2016-10-01 01 296
2016-10-01 02 137
2016-10-01 03 30
2016-10-02 01 301
2016-10-02 02 201
...
I hope that's clear enough. The code I've written is:
df['Freq'] = 1
groupeddf = df.groupby(['Date','Device']).count()
finaldf = groupeddf.groupby(['Date','Freq']).count()
However it pops up with KeyError: 'Date'
does anyone have any suggestions on alternate approaches, or where I might be going wrong
Upvotes: 2
Views: 760
Reputation: 61997
Perhaps there is some misunderstanding but I am reading the question that you would like frequency of the frequency of each device for each day. This looks to be a different interpretation of @jezrael. In that case you can call value_counts
twice to get what you want
Using slightly different data than @jezrael you can do the following
print (df)
Device Date
0 123414 2016-11-01
1 123456 2016-11-01
2 123456 2016-10-25
3 123414 2016-10-25
4 123414 2016-10-25
5 123414 2016-10-29
6 123414 2016-10-29
7 123414 2016-10-29
8 123456 2016-10-29
df1 = df.groupby('Date')['Device'].apply(lambda x: pd.value_counts(x).value_counts()).reset_index()
df1.columns = ['Date', 'freq', 'count']
df1.sort_values(['Date', 'freq'])
With output
Date freq count
1 2016-10-25 1 1
0 2016-10-25 2 1
3 2016-10-29 1 1
2 2016-10-29 3 1
4 2016-11-01 1 2
Upvotes: 0
Reputation: 863166
I think you can use transform
by size
and then aggregate size
by new column freq
:
print (df)
Device Date
0 123414 2016-11-01
1 123456 2016-11-01
2 123456 2016-10-25
3 123414 2016-10-29
4 123414 2016-10-29
5 123414 2016-10-29
df['freq'] = df.groupby(['Device','Date'])['Device'].transform('size')
print (df)
Device Date freq
0 123414 2016-11-01 1
1 123456 2016-11-01 1
2 123456 2016-10-25 1
3 123414 2016-10-29 3
4 123414 2016-10-29 3
5 123414 2016-10-29 3
df = df.groupby(['Date','freq']).size().reset_index(name='count')
print (df)
Date freq count
0 2016-10-25 1 1
1 2016-10-29 3 3
2 2016-11-01 1 2
Also if need all combination freq
add unstack
and stack
:
df = df.groupby(['Date','freq'])
.size()
.unstack(fill_value=0)
.stack()
.reset_index(name='count')
print (df)
Date freq count
0 2016-10-25 1 1
1 2016-10-25 3 0
2 2016-10-29 1 0
3 2016-10-29 3 3
4 2016-11-01 1 2
5 2016-11-01 3 0
Upvotes: 1