Chris Doyle
Chris Doyle

Reputation: 12027

Stacked area chart from dataframe

I have aggregated some log data into a simple CSV format, this is essentially API calls. each token has a limit/quota and sometimes I get requests to increase a quota or limit. I would like to be able to visually look at traffic throughput to understand the through put of each type of API call and total through put for all API calls.

I have played around with the data in pandas and can get it in a table style structure grouping api call counts per second.

                                  token
api          timestamp                 
ActivateAPI  2020-07-13 14:09:30      1
             2020-07-13 14:09:31      2
SuspendAPI   2020-07-13 14:09:23      1
             2020-07-13 14:09:31      2
             2020-07-13 14:09:32      2
TerminateAPI 2020-07-13 14:09:29      2
             2020-07-13 14:09:39      1
             2020-07-13 14:09:49      1

I have also used matplotlib example so know the concept of how to make a stacked area chart for example

enter image description here

However for the life of me i cannot seem to map my dataframe into a stacked area chart that would give a view of time along the bottom (x axis) and counts up the left (y axis) and then each area stack representing an API. Below is my code, which does the dataframe and manaully makes the stacked chart, but need help to get the chart rendered from my dataframe so i can produce graphs for any of the webserver logs

import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO

date_format = "%m.%d.%Y %H:%M:%S,%f"
data = """timestamp~api~token
07.13.2020 14:09:23,928~SuspendAPI~TOKEN1
07.13.2020 14:09:29,324~TerminateAPI~TOKEN1
07.13.2020 14:09:29,424~TerminateAPI~TOKEN1
07.13.2020 14:09:30,678~ActivateAPI~TOKEN1
07.13.2020 14:09:31,678~ActivateAPI~TOKEN1
07.13.2020 14:09:31,886~SuspendAPI~TOKEN1
07.13.2020 14:09:31,886~SuspendAPI~TOKEN1
07.13.2020 14:09:31,978~ActivateAPI~TOKEN1
07.13.2020 14:09:32,786~SuspendAPI~TOKEN1
07.13.2020 14:09:32,886~SuspendAPI~TOKEN1
07.13.2020 14:09:39,324~TerminateAPI~TOKEN1
07.13.2020 14:09:49,324~TerminateAPI~TOKEN1"""

df = pd.read_csv(StringIO(data), sep='~')
df['timestamp'] = pd.to_datetime(df['timestamp'], format=date_format)
df.timestamp = df.timestamp.map(lambda x: x.replace(microsecond=0))
df.set_index('timestamp', inplace=True)
grouped = df.groupby([df.api, df.index]).count()
print(grouped)

x = range(1, 6)
y = [[1, 4, 6, 4, 1], [2, 2, 7, 5, 4], [2, 8, 5, 1, 6]]

# Basic stacked area chart.
plt.stackplot(x, y, labels=['ActivateAPI', 'SuspendAPI', 'TerminateAPI'])
plt.legend(loc='upper left')
plt.show()

Upvotes: 1

Views: 624

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Pandas has .plot.area() function that draw an area plots where the x-axis is the index, the columns are the categories, which are stacked by default.

In your case, you want to unstack api so as they become columns and use the provided plot.area() function. Also note that you can pass the index's name to groupby. So you can do:

grouped = df.groupby(['timestamp','api']).size()

grouped.unstack('api', fill_value=0).plot.area()

Upvotes: 1

Related Questions