Reputation: 510
Initially I had DF with 1 column of actions indexed with DatetimeIndex:
In [371]: dates
2013-12-29 19:21:00 action1
2013-12-29 19:21:01 action2
2013-12-29 19:21:11 action1
2013-12-29 19:21:13 action2
...
In [372]: dates.index
Out[372]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-12-29 19:02:27, ..., 2014-01-13 16:30:31]
Length: 108957, Freq: None, Timezone: None
I want to plot number of actions of certain type vs day
So I grouped actions by date, using agg
grouped = dates.groupby([dates.index.to_period(freq = 'D'), 'actiontype']).agg(len)
Which gave me multiindexed series:
...
2014-01-13 action1 435
action2 2067
..
2014-01-14 action1 455
action2 1007
...
Which seems to be precisely what I need.
But when tried unstack
the series to get rid of the MultiIndex and plot my data, and got the error:
In [379]: grouped.unstack()
ValueError: freq not specified and cannot be inferred from first element
What's my mistake here? Thank you.
Upvotes: 3
Views: 1903
Reputation: 14963
If you need to use .unstack()
and it doesn't work with that multiindex, then starting from the non-indexed data
index mydate action
0 2000-12-29 00:10:00 action1
1 2000-12-29 00:20:00 action2
2 2000-12-29 00:30:00 action2
3 2000-12-29 00:40:00 action1
4 2000-12-29 00:50:00 action1
5 2000-12-31 00:10:00 action1
6 2000-12-31 00:20:00 action2
7 2000-12-31 00:30:00 action2
you could do something like
df['day'] = df['mydate'].apply(lambda x: x.split()[0])
counts = df.groupby(['day', 'action']).agg(len)
basically you forget about the datetime being a datetime, you just keep it as a string and you only keep the date, discarding the time. now pandas will be dumb on the time dimension but counts.unstack()
gives you
mydate
action action1 action2
day
2000-12-29 3 2
2000-12-31 1 2
Upvotes: 2