gazm2k5
gazm2k5

Reputation: 490

Stackplot with matplotlib and a grouped Pandas dataframe

The data I'm using is a conversation message log. I have a Pandas Dataframe with datestamps as the index, and two columns; one for "sender" and one for "message."

I'm simply trying to plot a stackplot of messages over time. I don't actually need the contents of message, so I've cleaned the data as follows:

Dummydata:

df = pd.Dataframe({'date': [Timestamp('2019-07-29 19:58:00'), Timestamp('2019-07-29 20:03:00'), Timestamp('2019-08-01 19:22:00'), Timestamp('2019-08-01 19:23:00'), Timestamp('2019-08-01 19:25:00'), Timestamp('2019-08-04 11:28:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 12:43:00'), Timestamp('2019-08-04 12:49:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-25 22:33:00'), Timestamp('2019-08-27 11:55:00'), Timestamp('2019-08-27 18:35:00'), Timestamp('2019-11-06 18:53:00'), Timestamp('2019-11-06 18:54:00'), Timestamp('2019-11-06 20:42:00'), Timestamp('2019-11-07 00:16:00'), Timestamp('2019-11-07 15:24:00'), Timestamp('2019-11-07 16:06:00'), Timestamp('2019-11-08 11:48:00'), Timestamp('2019-11-08 11:53:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:59:00'), Timestamp('2019-11-08 12:03:00'), Timestamp('2019-12-24 13:40:00'), Timestamp('2019-12-24 13:42:00'), Timestamp('2019-12-24 13:43:00'), Timestamp('2019-12-24 13:44:00'), Timestamp('2019-12-24 13:44:00')], 'sender': ['Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2'], 'message': ['Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 'Then we are both glad', 'Indeed we are.', 'I sure hope this is enough fake conversation for stackoverflow.', 'Better write a few more messages just in case', "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted']})
dfgrouped = df.groupby(["sender"])
dfgrouped[["sender"]].resample("D").count()

This gives a dataframe grouped by each sender in the conversation, with DateTime as index and number of messages sent for that given day.

dfgrouped[["sender"]].get_group("Joe Bloggs").resample("D").count()

... would give a dataframe with just one user and their message counts per day.

I'd like to know how to use matplotlib to plot a stackplot where each "sender" is a different line. I haven't been able to achieve this through either

ax.stackplot(dfgrouped[["sender"]].resample("D").count())

or through looping:

for sender in df["sender"].unique():
     axs[i].stackplot(dfgrouped[["sender"]].get_group(sender).resample("D").count()

Upvotes: 1

Views: 582

Answers (1)

Arne
Arne

Reputation: 10545

You can use pandas' own stackplot function, df.plot.area(). This is a wrapper for the Matplotlib function, working as a method on DataFrames. You just have to get your data in the right shape. With your groupby and count operations you're almost there:

import pandas as pd

df = pd.DataFrame({'sender': [
    'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 
    'Person 1', 'Person 2', 'Person 1', 'Person 2'], 
    'message': [
    'Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 
    'Then we are both glad', 'Indeed we are.', 
    'I sure hope this is enough fake conversation for stackoverflow.', 
    'Better write a few more messages just in case', 
    "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted']}, 
    index = pd.DatetimeIndex([
    pd.Timestamp('2019-07-29 19:58:00'), pd.Timestamp('2019-07-29 20:03:00'), 
    pd.Timestamp('2019-08-01 19:22:00'), pd.Timestamp('2019-08-01 19:23:00'),
    pd.Timestamp('2019-08-01 19:25:00'), pd.Timestamp('2019-08-04 11:28:00'), 
    pd.Timestamp('2019-08-04 11:29:00'), pd.Timestamp('2019-08-04 11:29:00'), 
    pd.Timestamp('2019-08-04 12:43:00'), pd.Timestamp('2019-08-04 12:49:00'), 
    pd.Timestamp('2019-08-04 12:51:00'), pd.Timestamp('2019-08-04 12:51:00'), 
    pd.Timestamp('2019-08-25 22:33:00'), pd.Timestamp('2019-08-27 11:55:00'), 
    pd.Timestamp('2019-08-27 18:35:00'), pd.Timestamp('2019-11-06 18:53:00'), 
    pd.Timestamp('2019-11-06 18:54:00'), pd.Timestamp('2019-11-06 20:42:00'), 
    pd.Timestamp('2019-11-07 00:16:00'), pd.Timestamp('2019-11-07 15:24:00'), 
    pd.Timestamp('2019-11-07 16:06:00'), pd.Timestamp('2019-11-08 11:48:00'), 
    pd.Timestamp('2019-11-08 11:53:00'), pd.Timestamp('2019-11-08 11:55:00'), 
    pd.Timestamp('2019-11-08 11:55:00'), pd.Timestamp('2019-11-08 11:59:00'), 
    pd.Timestamp('2019-11-08 12:03:00'), pd.Timestamp('2019-12-24 13:40:00'), 
    pd.Timestamp('2019-12-24 13:42:00'), pd.Timestamp('2019-12-24 13:43:00'), 
    pd.Timestamp('2019-12-24 13:44:00'), pd.Timestamp('2019-12-24 13:44:00')]))

df_group = df.groupby(["sender"])
df_count = df_group[["sender"]].resample("D").count()

df_plot = pd.concat([df_count.loc['Person 1', :], 
                     df_count.loc['Person 2', :]], 
                    axis=1)
df_plot.columns = ['Sender 1', 'Sender 2']

df_plot.plot.area()

enter image description here

Upvotes: 3

Related Questions