Timofey Pivsaev
Timofey Pivsaev

Reputation: 510

Plot number of occurrences from Pandas DataFrame

I have a DataFrame with two columns. One of them is containing timestamps and another one - id of some action. Something like that:

2000-12-29 00:10:00     action1
2000-12-29 00:20:00     action2
2000-12-29 00:30:00     action2
2000-12-29 00:40:00     action1
2000-12-29 00:50:00     action1
...
2000-12-31 00:10:00     action1
2000-12-31 00:20:00     action2
2000-12-31 00:30:00     action2

I would like to know how many actions of certain type have been performed in a certain day. I.e. for every day, I need to count the number of occurrences of actionX and plot this data with date on X axis and number of occurrences of actionX on Y axes, for each date.

Of course I can count actions for each day naively just by iterating through my dataset. But what's the "right way" to do in with pandas/matplotlib?

Upvotes: 16

Views: 65768

Answers (3)

minhle_r7
minhle_r7

Reputation: 872

I find the combo .count_values().plot.bar() very intuitive to do histogram plot. It also puts categories in the right order for you and, in many cases where there are too many categories, you can simply do .count_values().iloc[:k].plot.bar().

So, what I would do in your case is to compute a new Pandas Series of date+action, formatted for readability, and then invoke one of the snippet above. The code might look like this:

date_and_action = df['date'].astype(str).str.slice(0, 10) + '_' + df['action']
date_and_action.count_values().iloc[:k].plot.bar()

Upvotes: 2

David Hagan
David Hagan

Reputation: 1166

You can get the counts by using

df.groupby([df.index.date, 'action']).count()

or you can plot directly using this method

df.groupby([df.index.date, 'action']).count().plot(kind='bar')

You could also just store the results to count and then plot it separately. This is assuming that your index is already in datetimeindex format, otherwise follow the directions of @mkln above.

Upvotes: 24

mkln
mkln

Reputation: 14953

Starting from

                mydate col_name
0  2000-12-29 00:10:00  action1
1  2000-12-29 00:20:00  action2
2  2000-12-29 00:30:00  action2
3  2000-12-29 00:40:00  action1
4  2000-12-29 00:50:00  action1
5  2000-12-31 00:10:00  action1
6  2000-12-31 00:20:00  action2
7  2000-12-31 00:30:00  action2

You can do

df['mydate'] = pd.to_datetime(df['mydate'])
df = df.set_index('mydate')
df['day'] = df.index.date
counts = df.groupby(['day', 'col_name']).agg(len)

but perhaps there's an even more straightforward way. the above should work anyway.

If you want to use counts as a DataFrame, I'd then transform it back

counts = pd.DataFrame(counts, columns=['count'])

Upvotes: 9

Related Questions