Stuart Kirkup
Stuart Kirkup

Reputation: 155

Aggregate time series with group by and create chart with multiple series

I have time series data and I want to create a chart of the monthly (x-axis) counts of the number of records (lines chart), grouped by sentiment (multiple lines)

Data looks like this

created_at                         id                   polarity  sentiment  
0  Fri Nov 02 11:22:47 +0000 2018  1058318498663870464  0.000000   neutral   
1  Fri Nov 02 11:20:54 +0000 2018  1058318026758598656  0.011905   neutral   
2  Fri Nov 02 09:41:37 +0000 2018  1058293038739607552  0.800000  positive   
3  Fri Nov 02 09:40:48 +0000 2018  1058292834699231233  0.800000  positive   
4  Thu Nov 01 18:23:17 +0000 2018  1058061933243518976  0.233333   neutral   
5  Thu Nov 01 17:50:39 +0000 2018  1058053723157618690  0.400000  positive   
6  Wed Oct 31 18:57:53 +0000 2018  1057708251758903296  0.566667  positive   
7  Sun Oct 28 17:21:24 +0000 2018  1056596810570100736  0.000000   neutral   
8  Sun Oct 21 13:00:53 +0000 2018  1053994531845296128  0.136364   neutral   
9  Sun Oct 21 12:55:12 +0000 2018  1053993101205868544  0.083333   neutral

So far I have managed to aggregate to the monthly totals, with the following code:

import pandas as pd

tweets = process_twitter_json(file_name) 
#print(tweets[:10])

df = pd.DataFrame.from_records(tweets)
print(df.head(10))

#make the string date into a date field    
df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']

#print('Monthly counts')
monthly_sentiment = df.groupby('sentiment')['tweet_datetime'].resample('M').count()

I'm struggling with how to chart the data.

Upvotes: 2

Views: 1527

Answers (1)

Stuart Kirkup
Stuart Kirkup

Reputation: 155

OK I changed the monthly aggregation method and used Grouper instead of resample, this meant that when I did the unstack() the resulting dataframe was vertical (deep and narrow) with dates as rows rather than horizontal with the dates as columns headers which meant I no longer had issues with dates being stored as strings when I came to chart it.

Full code:

import pandas as pd

tweets = process_twitter_json(file_name) 

df = pd.DataFrame.from_records(tweets)


df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']

grouper = df.groupby(['sentiment', pd.Grouper(key='tweet_datetime', freq='M')]).id.count()
result = grouper.unstack('sentiment').fillna(0)

##=================================================
##PLOTLY - charts in Jupyter

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

print (__version__)# requires version >= 1.9.0

import plotly.graph_objs as go

init_notebook_mode(connected=True)

trace0 = go.Scatter(
    x = result.index,
    y = result['positive'],
    name = 'Positive',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4)
)

trace1 = go.Scatter(
    x = result.index,
    y = result['negative'],
    name = 'Negative',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4)
)    
trace2 = go.Scatter(
    x = result.index,
    y = result['neutral'],
    name = 'Neutral',
    line = dict(
        color = ('rgb(12, 205, 24)'),
        width = 4)
)

data = [trace0, trace1, trace2]

iplot(data)

Upvotes: 1

Related Questions