Reputation: 12027
I have aggregated some log data into a simple CSV format, this is essentially API calls. each token has a limit/quota and sometimes I get requests to increase a quota or limit. I would like to be able to visually look at traffic throughput to understand the through put of each type of API call and total through put for all API calls.
I have played around with the data in pandas and can get it in a table style structure grouping api call counts per second.
token
api timestamp
ActivateAPI 2020-07-13 14:09:30 1
2020-07-13 14:09:31 2
SuspendAPI 2020-07-13 14:09:23 1
2020-07-13 14:09:31 2
2020-07-13 14:09:32 2
TerminateAPI 2020-07-13 14:09:29 2
2020-07-13 14:09:39 1
2020-07-13 14:09:49 1
I have also used matplotlib example so know the concept of how to make a stacked area chart for example
However for the life of me i cannot seem to map my dataframe into a stacked area chart that would give a view of time along the bottom (x axis) and counts up the left (y axis) and then each area stack representing an API. Below is my code, which does the dataframe and manaully makes the stacked chart, but need help to get the chart rendered from my dataframe so i can produce graphs for any of the webserver logs
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
date_format = "%m.%d.%Y %H:%M:%S,%f"
data = """timestamp~api~token
07.13.2020 14:09:23,928~SuspendAPI~TOKEN1
07.13.2020 14:09:29,324~TerminateAPI~TOKEN1
07.13.2020 14:09:29,424~TerminateAPI~TOKEN1
07.13.2020 14:09:30,678~ActivateAPI~TOKEN1
07.13.2020 14:09:31,678~ActivateAPI~TOKEN1
07.13.2020 14:09:31,886~SuspendAPI~TOKEN1
07.13.2020 14:09:31,886~SuspendAPI~TOKEN1
07.13.2020 14:09:31,978~ActivateAPI~TOKEN1
07.13.2020 14:09:32,786~SuspendAPI~TOKEN1
07.13.2020 14:09:32,886~SuspendAPI~TOKEN1
07.13.2020 14:09:39,324~TerminateAPI~TOKEN1
07.13.2020 14:09:49,324~TerminateAPI~TOKEN1"""
df = pd.read_csv(StringIO(data), sep='~')
df['timestamp'] = pd.to_datetime(df['timestamp'], format=date_format)
df.timestamp = df.timestamp.map(lambda x: x.replace(microsecond=0))
df.set_index('timestamp', inplace=True)
grouped = df.groupby([df.api, df.index]).count()
print(grouped)
x = range(1, 6)
y = [[1, 4, 6, 4, 1], [2, 2, 7, 5, 4], [2, 8, 5, 1, 6]]
# Basic stacked area chart.
plt.stackplot(x, y, labels=['ActivateAPI', 'SuspendAPI', 'TerminateAPI'])
plt.legend(loc='upper left')
plt.show()
Upvotes: 1
Views: 624
Reputation: 150745
Pandas has .plot.area()
function that draw an area plots where the x
-axis is the index, the columns are the categories, which are stacked by default.
In your case, you want to unstack api
so as they become columns and use the provided plot.area()
function. Also note that you can pass the index's name to groupby
. So you can do:
grouped = df.groupby(['timestamp','api']).size()
grouped.unstack('api', fill_value=0).plot.area()
Upvotes: 1