Reputation: 815
I have some data as follows:
+---------+-------+---------+----------------+
| Machine | Event | Outcome | Duration Total |
+---------+-------+---------+----------------+
| a | 1 | FAIL | 1127 |
| a | 2 | FAIL | 56099 |
| a | 2 | PASS | 15213 |
| a | 3 | FAIL | 13891 |
| a | 3 | PASS | 13934 |
| a | 4 | FAIL | 6844 |
| a | 5 | FAIL | 6449 |
| b | 1 | FAIL | 21331 |
| b | 2 | FAIL | 30362 |
| b | 3 | FAIL | 12194 |
| b | 3 | PASS | 7390 |
| b | 4 | FAIL | 35472 |
| b | 4 | PASS | 7731 |
| b | 5 | FAIL | 7654 |
| c | 1 | FAIL | 16833 |
| c | 1 | PASS | 21337 |
| c | 2 | FAIL | 440 |
| c | 2 | PASS | 14320 |
| c | 3 | FAIL | 5281 |
+---------+-------+---------+----------------+
I'm trying to make a categorical scatter plot of total duration of each event and each machine. Or any other visualization to analyze them relatively.
What would be a good choice and how to go about it?
Upvotes: 0
Views: 68
Reputation: 2579
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x = 'Event', y = 'Duration', hue = 'Machine', col = 'Outcome', data = df)
Give this a try, its two scatter plots. X axis is the event, y axis is Duration, color of the dots is based on the machine, and there is two graphs, one for fail and next to it is another for pass. "df" is your dataframe. You can remove col = 'Outcome'
to have both Fail and Pass on the same graph.
EDIT:
fig, ax = plt.subplots(figsize = (10,10))
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'PASS'], ax = ax)
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'FAIL'], ax = ax,
style = 'Machine', markers = ['x', 'x', 'x'])
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ['Machine - Pass', 'a' ,'b', 'c', 'Machine - Fail', 'a','b','c'])
plt.show()
Upvotes: 1