Reputation: 637
Suppose I made a groupby on the valgdata DataFrame like below:
grouped_valgdata = valgdata.groupby(['news_site','dato_uden_tid']).mean()
Now I get this:
news_site dato_uden_tid 2015-06-15 54.777183
2015-06-16 54.703167
2015-06-17 54.948775
2015-06-18 54.424881
2015-06-19 53.290554 2015-06-15 53.279251
2015-06-16 53.285643
2015-06-17 53.558753
2015-06-18 52.854750
2015-06-19 54.415988 2015-06-15 56.590428
2015-06-16 55.313752
2015-06-17 53.771377
2015-06-18 53.218408
2015-06-19 54.392638 2015-06-15 54.759532
2015-06-16 55.182641
2015-06-17 55.001800
2015-06-18 56.004326
2015-06-19 54.649052
Now I want to make a timeseries for each of the news_site, where dato_uden_tid is on the X axis and sentiment is on Y axis.
What is the best and easiest way to accomplish that?
Thank you!
Upvotes: 9
Views: 18883
Reputation: 40869
Here is a solution using Pandas and Matplotlib with more fine-grained control.
First, I provided below a function that generates a random dataframe for testing. Importantly, it creates three columns that generalize to more abstract problems:
is a datetime
column containing timestampsmy_series
is the string label to which you want to apply the groupby
is a numeric value recorded for my_series
at time my_timestamp
Replace the column names with whatever dataframe that you have.
def generate_random_data(N=100):
Returns a dataframe with N rows of random data.
list_of_lists = []
labels = ['foo', 'bar', 'baz']
epoch = 1515617110
for _ in range(N):
key = random.choice(labels)
value = 0
if key == 'foo':
value = random.randint(1, 10)
elif key == 'bar':
value = random.randint(50, 60)
value = random.randint(80, 90)
epoch += random.randint(5000, 30000)
row = [key, epoch, value]
df = pd.DataFrame(list_of_lists, columns=['my_series', 'epoch', 'my_value'])
df['my_timestamp'] = pd.to_datetime(df['epoch'], unit='s')
df = df[['my_timestamp', 'my_series', 'my_value']]
#df.set_index('ts', inplace=True)
return df
Here is some example data that was generated:
Now, the following code will run the groupby
and plot a nice time series graph.
def plot_gb_time_series(df, ts_name, gb_name, value_name, figsize=(20,7), title=None):
Runs groupby on Pandas dataframe and produces a time series chart.
df : Pandas dataframe
ts_name : string
The name of the df column that has the datetime timestamp x-axis values.
gb_name : string
The name of the df column to perform group-by.
value_name : string
The name of the df column for the y-axis.
figsize : tuple of two integers
Figure size of the resulting plot, e.g. (20, 7)
title : string
Optional title
xtick_locator = DayLocator(interval=1)
xtick_dateformatter = DateFormatter('%m/%d/%Y')
fig, ax = plt.subplots(figsize=figsize)
for key, grp in df.groupby([gb_name]):
ax = grp.plot(ax=ax, kind='line', x=ts_name, y=value_name, label=key, marker='o')
ax.legend(loc='upper left')
_ = plt.xticks(rotation=90, )
_ = plt.grid()
_ = plt.xlabel('')
_ = plt.ylim(0, df[value_name].max() * 1.25)
_ = plt.ylabel(value_name)
if title is not None:
_ = plt.title(title)
_ =
Here is an example invocation:
df = generate_random_data()
plot_gb_time_series(df, 'my_timestamp', 'my_series', 'my_value',
figsize=(10, 5), title="Random data")
And here is the resulting time series plot:
Upvotes: 11
Reputation: 76297
(Am a bit amused, as this question caught me doing the exact same thing.)
You could do something like
which would
reverse the groupby
unstack the new sites to be columns
To plot, just do the previous snippet immediately followed by .plot()
Upvotes: 11