Reputation: 490
The data I'm using is a conversation message log. I have a Pandas Dataframe with datestamps as the index, and two columns; one for "sender" and one for "message."
I'm simply trying to plot a stackplot of messages over time. I don't actually need the contents of message, so I've cleaned the data as follows:
Dummydata:
df = pd.Dataframe({'date': [Timestamp('2019-07-29 19:58:00'), Timestamp('2019-07-29 20:03:00'), Timestamp('2019-08-01 19:22:00'), Timestamp('2019-08-01 19:23:00'), Timestamp('2019-08-01 19:25:00'), Timestamp('2019-08-04 11:28:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 12:43:00'), Timestamp('2019-08-04 12:49:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-25 22:33:00'), Timestamp('2019-08-27 11:55:00'), Timestamp('2019-08-27 18:35:00'), Timestamp('2019-11-06 18:53:00'), Timestamp('2019-11-06 18:54:00'), Timestamp('2019-11-06 20:42:00'), Timestamp('2019-11-07 00:16:00'), Timestamp('2019-11-07 15:24:00'), Timestamp('2019-11-07 16:06:00'), Timestamp('2019-11-08 11:48:00'), Timestamp('2019-11-08 11:53:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:59:00'), Timestamp('2019-11-08 12:03:00'), Timestamp('2019-12-24 13:40:00'), Timestamp('2019-12-24 13:42:00'), Timestamp('2019-12-24 13:43:00'), Timestamp('2019-12-24 13:44:00'), Timestamp('2019-12-24 13:44:00')], 'sender': ['Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2'], 'message': ['Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 'Then we are both glad', 'Indeed we are.', 'I sure hope this is enough fake conversation for stackoverflow.', 'Better write a few more messages just in case', "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted']})
dfgrouped = df.groupby(["sender"])
dfgrouped[["sender"]].resample("D").count()
This gives a dataframe grouped by each sender in the conversation, with DateTime as index and number of messages sent for that given day.
dfgrouped[["sender"]].get_group("Joe Bloggs").resample("D").count()
... would give a dataframe with just one user and their message counts per day.
I'd like to know how to use matplotlib to plot a stackplot where each "sender" is a different line. I haven't been able to achieve this through either
ax.stackplot(dfgrouped[["sender"]].resample("D").count())
or through looping:
for sender in df["sender"].unique():
axs[i].stackplot(dfgrouped[["sender"]].get_group(sender).resample("D").count()
Upvotes: 1
Views: 582
Reputation: 10545
You can use pandas' own stackplot function, df.plot.area(). This is a wrapper for the Matplotlib function, working as a method on DataFrames. You just have to get your data in the right shape. With your groupby and count operations you're almost there:
import pandas as pd
df = pd.DataFrame({'sender': [
'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2',
'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2',
'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2',
'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2',
'Person 1', 'Person 2', 'Person 1', 'Person 2'],
'message': [
'Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.',
'Then we are both glad', 'Indeed we are.',
'I sure hope this is enough fake conversation for stackoverflow.',
'Better write a few more messages just in case',
"But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.",
'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted',
'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted',
'redacted', 'redacted', 'redacted', 'redacted', 'redacted']},
index = pd.DatetimeIndex([
pd.Timestamp('2019-07-29 19:58:00'), pd.Timestamp('2019-07-29 20:03:00'),
pd.Timestamp('2019-08-01 19:22:00'), pd.Timestamp('2019-08-01 19:23:00'),
pd.Timestamp('2019-08-01 19:25:00'), pd.Timestamp('2019-08-04 11:28:00'),
pd.Timestamp('2019-08-04 11:29:00'), pd.Timestamp('2019-08-04 11:29:00'),
pd.Timestamp('2019-08-04 12:43:00'), pd.Timestamp('2019-08-04 12:49:00'),
pd.Timestamp('2019-08-04 12:51:00'), pd.Timestamp('2019-08-04 12:51:00'),
pd.Timestamp('2019-08-25 22:33:00'), pd.Timestamp('2019-08-27 11:55:00'),
pd.Timestamp('2019-08-27 18:35:00'), pd.Timestamp('2019-11-06 18:53:00'),
pd.Timestamp('2019-11-06 18:54:00'), pd.Timestamp('2019-11-06 20:42:00'),
pd.Timestamp('2019-11-07 00:16:00'), pd.Timestamp('2019-11-07 15:24:00'),
pd.Timestamp('2019-11-07 16:06:00'), pd.Timestamp('2019-11-08 11:48:00'),
pd.Timestamp('2019-11-08 11:53:00'), pd.Timestamp('2019-11-08 11:55:00'),
pd.Timestamp('2019-11-08 11:55:00'), pd.Timestamp('2019-11-08 11:59:00'),
pd.Timestamp('2019-11-08 12:03:00'), pd.Timestamp('2019-12-24 13:40:00'),
pd.Timestamp('2019-12-24 13:42:00'), pd.Timestamp('2019-12-24 13:43:00'),
pd.Timestamp('2019-12-24 13:44:00'), pd.Timestamp('2019-12-24 13:44:00')]))
df_group = df.groupby(["sender"])
df_count = df_group[["sender"]].resample("D").count()
df_plot = pd.concat([df_count.loc['Person 1', :],
df_count.loc['Person 2', :]],
axis=1)
df_plot.columns = ['Sender 1', 'Sender 2']
df_plot.plot.area()
Upvotes: 3