Reputation: 219
I am processing a chatlog and my data consists of timestamps, usernames and messages. My goal is to plot the number of messages per month for several users, so that I can compare when users were active.
The Problem is the x-axis. There I would like to have the dates depending on frequency (in this case months). Instead it seems that the Multindex of the grouped data is output there. Also the data seems to be grouped correctly but there are three data points for each month in the plot.
I included some code to generate the random data. (I'm using Python 3.2)
Here is the current output:
import numpy as np
import time
import datetime
import pandas as pd
import matplotlib.pyplot as plt
from pandas.util.testing import rands
a=datetime.datetime(2012,12,3)
b=datetime.datetime(2013,12,3)
a_tstamp=time.mktime(a.timetuple())
b_tstamp=time.mktime(b.timetuple())
message_number=400
tstamps=np.random.random_integers(a_tstamp,b_tstamp,message_number)
tstamps.sort()
dates=[datetime.datetime.fromtimestamp(x) for x in tstamps]
usernames=[rands(4) for x in range(10)]
usernames=usernames*40
values=np.random.random_integers(0,45,message_number)
df=pd.DataFrame({'tstamps':dates,'usernames':usernames,'messages':[rands(5) for x in range(message_number)]})
df=df.set_index(df.tstamps)
grouped=df.groupby(df.usernames)
# trying to plot a trend to see how active user were over several months
plt.figure()
for k,g in grouped:
g=g.resample('m',how='count')
g.plot(style='*-',label=k )
plt.show()
plt.legend(loc='best')
plt.show()
Upvotes: 3
Views: 2254
Reputation: 35235
The Problem: Your result is indexed by date and by column (message, username, tstamps).
2013-07-31 messages 3
tstamps 3
usernames 3
2013-08-31 messages 4
tstamps 4
usernames 4
Instead of resampling the whole group, take the messages column only, and then resample,
plt.figure()
for k, g in grouped:
messages = g.messages.resample('m', how='count')
messages.plot(style='*-', label=k)
plt.show()
Now the Series being plotted is
2012-12-31 3
2013-01-31 3
2013-02-28 3
2013-03-31 4
...
And the output looks like
Upvotes: 3