Reputation: 155
I have time series data and I want to create a chart of the monthly (x-axis) counts of the number of records (lines chart), grouped by sentiment (multiple lines)
Data looks like this
created_at id polarity sentiment
0 Fri Nov 02 11:22:47 +0000 2018 1058318498663870464 0.000000 neutral
1 Fri Nov 02 11:20:54 +0000 2018 1058318026758598656 0.011905 neutral
2 Fri Nov 02 09:41:37 +0000 2018 1058293038739607552 0.800000 positive
3 Fri Nov 02 09:40:48 +0000 2018 1058292834699231233 0.800000 positive
4 Thu Nov 01 18:23:17 +0000 2018 1058061933243518976 0.233333 neutral
5 Thu Nov 01 17:50:39 +0000 2018 1058053723157618690 0.400000 positive
6 Wed Oct 31 18:57:53 +0000 2018 1057708251758903296 0.566667 positive
7 Sun Oct 28 17:21:24 +0000 2018 1056596810570100736 0.000000 neutral
8 Sun Oct 21 13:00:53 +0000 2018 1053994531845296128 0.136364 neutral
9 Sun Oct 21 12:55:12 +0000 2018 1053993101205868544 0.083333 neutral
So far I have managed to aggregate to the monthly totals, with the following code:
import pandas as pd
tweets = process_twitter_json(file_name)
#print(tweets[:10])
df = pd.DataFrame.from_records(tweets)
print(df.head(10))
#make the string date into a date field
df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']
#print('Monthly counts')
monthly_sentiment = df.groupby('sentiment')['tweet_datetime'].resample('M').count()
I'm struggling with how to chart the data.
.unstack()
that turns the sentiment values into rows,
which is almost there, but the problem is dates become string column
headers, which is no good for chartingUpvotes: 2
Views: 1527
Reputation: 155
OK I changed the monthly aggregation method and used Grouper instead of resample, this meant that when I did the unstack() the resulting dataframe was vertical (deep and narrow) with dates as rows rather than horizontal with the dates as columns headers which meant I no longer had issues with dates being stored as strings when I came to chart it.
Full code:
import pandas as pd
tweets = process_twitter_json(file_name)
df = pd.DataFrame.from_records(tweets)
df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']
grouper = df.groupby(['sentiment', pd.Grouper(key='tweet_datetime', freq='M')]).id.count()
result = grouper.unstack('sentiment').fillna(0)
##=================================================
##PLOTLY - charts in Jupyter
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
print (__version__)# requires version >= 1.9.0
import plotly.graph_objs as go
init_notebook_mode(connected=True)
trace0 = go.Scatter(
x = result.index,
y = result['positive'],
name = 'Positive',
line = dict(
color = ('rgb(205, 12, 24)'),
width = 4)
)
trace1 = go.Scatter(
x = result.index,
y = result['negative'],
name = 'Negative',
line = dict(
color = ('rgb(22, 96, 167)'),
width = 4)
)
trace2 = go.Scatter(
x = result.index,
y = result['neutral'],
name = 'Neutral',
line = dict(
color = ('rgb(12, 205, 24)'),
width = 4)
)
data = [trace0, trace1, trace2]
iplot(data)
Upvotes: 1