jason adams
jason adams

Reputation: 565

scatter plot by category in pandas

This has been troubling me for the past 30 minutes. What I'd like to do is to scatter plot by category. I took a look at the documentation, but I haven't been able to find the answer there. I looked here, but when I ran that in iPython Notebook, I don't get anything.

Here's my data frame:

time    cpu   wait    category 
8       1     0.5     a 
9       2     0.2     a
2       3     0.1     b
10      4     0.7     c
3       5     0.2     c
5       6     0.8     b

Ideally, I'd like to have a scatter plot that shows CPU on the x axis, wait on the y axis, and each point on the graph is distinguished by category. So for example, if a=red, b=blue, and c=green then point (1, 0.5) and (2, 0.2) should be red, (3, 0.1) and (6, 0.8) should be blue, etc.

How would I do this with pandas? or matplotlib? whichever does the job.

Upvotes: 3

Views: 8920

Answers (3)

Alexander
Alexander

Reputation: 109520

This is essentially the same answer as @JoeCondron, but a two liner:

cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter', 
        colors=[cmap.get(c, 'black') for c in df.category])

If no color is mapped for the category, it defaults to black.

EDIT:

The above works for Pandas 0.14.1. For 0.16.2, 'colors' needs to be changed to 'c':

df.plot(x='cpu', y='wait', kind='scatter', 
    c=[cmap.get(c, 'black') for c in df.category])

Upvotes: 4

JoeCondron
JoeCondron

Reputation: 8906

You could do

color_map = {'a': 'r', 'b': 'b', 'c': 'y'}
ax = plt.subplot()
x, y = df.cpu, df.wait
colors = df.category.map(color_map)
ax.scatter(x, y, color=colors)

This will give you red for category a, blue for b, yellow for c. So you can past a list of color aliases of the same length as the arrays. You can check out the myriad available colours here : http://matplotlib.org/api/colors_api.html. I don't think the plot method is very useful for scatter plots.

Upvotes: 2

alex314159
alex314159

Reputation: 3247

I'd create a column with your colors based on category, then do the following, where ax is a matplotlib ax and df is your dataframe:

ax.scatter(df['cpu'], df['wait'], marker = '.', c = df['colors'], s = 100)

Upvotes: 2

Related Questions