Reputation: 510
I have a DataFrame with two columns. One of them is containing timestamps and another one - id of some action. Something like that:
2000-12-29 00:10:00 action1
2000-12-29 00:20:00 action2
2000-12-29 00:30:00 action2
2000-12-29 00:40:00 action1
2000-12-29 00:50:00 action1
...
2000-12-31 00:10:00 action1
2000-12-31 00:20:00 action2
2000-12-31 00:30:00 action2
I would like to know how many actions of certain type have been performed in a certain day. I.e. for every day, I need to count the number of occurrences of actionX and plot this data with date on X axis and number of occurrences of actionX on Y axes, for each date.
Of course I can count actions for each day naively just by iterating through my dataset. But what's the "right way" to do in with pandas/matplotlib?
Upvotes: 16
Views: 65768
Reputation: 872
I find the combo .count_values().plot.bar()
very intuitive to do histogram plot. It also puts categories in the right order for you and, in many cases where there are too many categories, you can simply do .count_values().iloc[:k].plot.bar()
.
So, what I would do in your case is to compute a new Pandas Series of date+action, formatted for readability, and then invoke one of the snippet above. The code might look like this:
date_and_action = df['date'].astype(str).str.slice(0, 10) + '_' + df['action']
date_and_action.count_values().iloc[:k].plot.bar()
Upvotes: 2
Reputation: 1166
You can get the counts by using
df.groupby([df.index.date, 'action']).count()
or you can plot directly using this method
df.groupby([df.index.date, 'action']).count().plot(kind='bar')
You could also just store the results to count
and then plot it separately. This is assuming that your index is already in datetimeindex format, otherwise follow the directions of @mkln above.
Upvotes: 24
Reputation: 14953
Starting from
mydate col_name
0 2000-12-29 00:10:00 action1
1 2000-12-29 00:20:00 action2
2 2000-12-29 00:30:00 action2
3 2000-12-29 00:40:00 action1
4 2000-12-29 00:50:00 action1
5 2000-12-31 00:10:00 action1
6 2000-12-31 00:20:00 action2
7 2000-12-31 00:30:00 action2
You can do
df['mydate'] = pd.to_datetime(df['mydate'])
df = df.set_index('mydate')
df['day'] = df.index.date
counts = df.groupby(['day', 'col_name']).agg(len)
but perhaps there's an even more straightforward way. the above should work anyway.
If you want to use counts as a DataFrame, I'd then transform it back
counts = pd.DataFrame(counts, columns=['count'])
Upvotes: 9