Reputation: 565
This has been troubling me for the past 30 minutes. What I'd like to do is to scatter plot by category. I took a look at the documentation, but I haven't been able to find the answer there. I looked here, but when I ran that in iPython Notebook, I don't get anything.
Here's my data frame:
time cpu wait category
8 1 0.5 a
9 2 0.2 a
2 3 0.1 b
10 4 0.7 c
3 5 0.2 c
5 6 0.8 b
Ideally, I'd like to have a scatter plot that shows CPU on the x axis, wait on the y axis, and each point on the graph is distinguished by category. So for example, if a=red, b=blue, and c=green then point (1, 0.5) and (2, 0.2) should be red, (3, 0.1) and (6, 0.8) should be blue, etc.
How would I do this with pandas? or matplotlib? whichever does the job.
Upvotes: 3
Views: 8920
Reputation: 109520
This is essentially the same answer as @JoeCondron, but a two liner:
cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter',
colors=[cmap.get(c, 'black') for c in df.category])
If no color is mapped for the category, it defaults to black.
EDIT:
The above works for Pandas 0.14.1. For 0.16.2, 'colors' needs to be changed to 'c':
df.plot(x='cpu', y='wait', kind='scatter',
c=[cmap.get(c, 'black') for c in df.category])
Upvotes: 4
Reputation: 8906
You could do
color_map = {'a': 'r', 'b': 'b', 'c': 'y'}
ax = plt.subplot()
x, y = df.cpu, df.wait
colors = df.category.map(color_map)
ax.scatter(x, y, color=colors)
This will give you red for category a, blue for b, yellow for c. So you can past a list of color aliases of the same length as the arrays. You can check out the myriad available colours here : http://matplotlib.org/api/colors_api.html. I don't think the plot method is very useful for scatter plots.
Upvotes: 2
Reputation: 3247
I'd create a column with your colors based on category, then do the following, where ax is a matplotlib ax and df is your dataframe:
ax.scatter(df['cpu'], df['wait'], marker = '.', c = df['colors'], s = 100)
Upvotes: 2