Jack022
Jack022

Reputation: 1257

Matplotlib error while setting colors on a scatter plot

I'm trying to plot a scatter plot with Matplotlib, but i'm having troubles setting colors.

Here is my code:

colors = [(141, 0, 248, 0.4) if x >= 150 and x < 200 else 
          (0, 244, 248, 0.4) if x >= 200 and x < 400 else
          (255, 255, 0, 0.7) if x >= 400 and x < 600 else
          (255, 140, 0, 0.8) if x >= 600 else (255, 0, 0, 0.8) for x in MyData.Qty]

print(len(colors))
ax1.scatter(MyData.Date, MyData.Rate, s=20, c=colors, marker='_')

Basically, i have a column called Qty on my dataframe, and according to that value, the colors is chosen. If Qty is bigger than x, the color will be red and so on, for example.

The previous code will give me the following error:

'c' argument has 2460 elements, which is inconsistent with 'x' and 'y' with size 615.

And i have no idea why does that happen, because if i try the following code, it will work without any problem:

colors = ['red' if x >= 150 and x < 200 else 
          'yellow' if x >= 200 and x < 400 else
          'green' if x >= 400 and x < 600 else
          'blue' if x >= 600 else 'purple' for x in MyData.Qty]

Here is a sample of my data:

    Date  Rate          Qty
0     18  140   207.435145
0     18  141   155.019884
0     18  178  1222.215201
0     18  230   256.010358
0     19  9450  1211.310384

The following will work too:

colors = [(1,1,0,0.8) if x>1000 else (1,0,0,0.4) for x in MyData.Qty]

Upvotes: 0

Views: 1398

Answers (1)

Tom
Tom

Reputation: 8790

Someone commented (and then deleted) referring to the documentation, but here is the part they were referring to (from plt.scatter):

Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. If you want to specify the same RGB or RGBA value for all points, use a 2-D array with a single row. Otherwise, value- matching will have precedence in case of a size matching with x and y.

But it seems that in addition, from here that matplotlib is expecting the RGB values to be from 0 to 1, rather than 0 to 255. So I just added two lines to a) explicitly convert colors as a numpy 2D array and b) divide the RGB values by 255 (leaving the alpha value untouched).

import matplotlib.pyplot as plt
import numpy as np

fig1, ax1 = plt.subplots()

colors = [(141, 0, 248, 0.4) if x >= 150 and x < 200 else 
          (0, 244, 248, 0.4) if x >= 200 and x < 400 else
          (255, 255, 0, 0.7) if x >= 400 and x < 600 else
          (255, 140, 0, 0.8) if x >= 600 else (255, 0, 0, 0.8) for x in MyData['Qty']]

#addition to convert colors
colors = np.array(colors)
colors[:,:3] /= 255

ax1.scatter(MyData['Date'], MyData["Rate"], s=20, c=colors, marker='_')

Removing the scaling (but still converting to 2D array), you will get the same error as you originally experienced, so I guess when it doesn't recognize 0 to 1 scaled RGB values, it tries to just interpret the flattened array and you get the 4x values problem.

Upvotes: 2

Related Questions