Pandas Scatterplot Using Data Frame Fields to Derive Colors and Legend

I want to create a scatterplot which shows two columns mapped against each other in pandas, a third for size, and then the color of the point based on the label (in the case below, last_name).

I then want a legend that shows a dot for the color and then the last_name value

Each last name should be associated with a different color and the legend shows, for example, a green dot and Miller, a red dot and Jacobson, etc.

%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
    'female': [0, 1, 1, 0, 1],
    'age': [42, 52, 36, 24, 73],
    'preTestScore': [4, 24, 31, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'female', 'preTestScore', 'postTestScore'])

plt.scatter(df.preTestScore, df.postTestScore, s=df.age, label=df.last_name)
plt.legend(loc='upper left', prop={'size':6}, bbox_to_anchor=(1,1),ncol=1)

And that gives me something like this:

enter image description here

I can't figure out how to get the colors in at all (ideally, I'd love to use a palette) or how to get the legend to show the last name and the dot

Any help would be much appreciated.. thanks!

Note - I am taking the example from here Chris Albon.

Upvotes: 3

Views: 2935

Answers (2)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339102

First, in order to produce colors, you can add a column with colors to your dataframe. Those colors can then be passed to the c keyword argument of scatter.

The usual solution for creating a legend with entries that are not easily accessible is to generate proxy artists. In this case one would create a set of markers of the different colors and provide it to the handles argument of legend. The legend labels are then simply the last_names from the dataframe.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.lines

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
    'female': [0, 1, 1, 0, 1],
    'age': [42, 52, 36, 24, 73],
    'preTestScore': [4, 24, 31, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70],
    'colors' : ["r", "g", "b", "k", "cyan"]} # add a column for colors
df = pd.DataFrame(raw_data, 
     columns = ['first_name', 'last_name', 'age', 'female', 'preTestScore', 'postTestScore', "colors"])

#supply colors as argument for c
plt.scatter(df.preTestScore, df.postTestScore, s=df.age, c=df.colors) 
# generate proxy artists for legend
handles = [matplotlib.lines.Line2D([],[], marker="o", color=c, linestyle="none") for c in df.colors.values]
# supply proxy artists to handles and last names to labels
plt.legend(handles=handles, labels=list(df.last_name.values), 
           loc='upper left', prop={'size':6}, bbox_to_anchor=(1,1),ncol=1, numpoints=1)
plt.subplots_adjust(right=0.8)
plt.show()

enter image description here

Upvotes: 5

Stop harming Monica
Stop harming Monica

Reputation: 12590

A call to scatter will only make a legend entry. If you want a legend entry for each dot the easiest way is calling a plotting method for each dot. This should not be a problem performance-wise because you don't want thousands of entries in your legend. I will use plot because it works well for a dot but you could use scatter as well if you need fancier effects.

for _, row in df.iterrows():
    plt.plot(row.preTestScore, row.postTestScore, 'o', ms=np.sqrt(row.age),
             label=row.last_name)
plt.legend(loc='upper left', bbox_to_anchor=(1,1))

enter image description here

I didn't figure out (yet) how to make the dots in the legend the same size --and I am not sure that's what you want. I think different sizes look good and can help to locate people in the plot.

Alternatively you could use only one call to scatter and then examine the properties of the returned PathCollection and build the legend by hand, but I think my approach is cleaner.

Upvotes: 1

Related Questions