Bubble Plot by Shape per data split type

Question

I havecreated a bubble plot where I've measured true vs predicted label values, I wanted to know if it would be possible to change the plot shapes according to their data split. I want to maintain the colors of my plot per interval_size but just have the shape change according to data split.

Data Table

min            max          y    interval_size   y_pred     split
0.654531    1.021657    0.837415    0.367126    0.838094    train
0.783401    1.261898    1.000000    0.478497    1.022649    valid
-0.166070   0.543749    0.059727    0.709819    0.188840    train
0.493270    1.112610    0.504393    0.619340    0.802940    valid
0.140510    0.572957    0.479063    0.432447    0.356734    train

Plot1

Plot 2

Plot Code

plt.figure(figsize=(16,8))

sns.set_context("talk", font_scale=1.1)
plt.figure(figsize=(10,6))
sns.scatterplot(x="y", 
                y="y_pred",
                size="interval_size",            
                data=df,
                alpha=0.65,
                c=interval_size,
                cmap='viridis', 
                hue = 'split',
                s = (interval_size**2)*50)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.01, 1),borderaxespad=0)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.01, 0.54),  borderaxespad=0.)

#Plot Characteristics
plt.title("True vs Predicted Labels", fontsize = 36)
plt.xlabel("True Labels", fontsize = 25)
plt.ylabel("Predicted Labels", fontsize = 25)

Question:

Validation data would be nice to include, how can I perhaps differentiate by shape, e.g. triangle/circle?

Timothy Chan · Accepted Answer

Seaborn has a lot of in-depth customization packed into simple parameters. For your code, you simply want to add a keyword parameter to your sns.scatterplot() function:

style = 'split',

This will change the markers according to the categorical values, although it will pick the defaults. If you want more control over the specific markers being used, you can pass another parameter to map the categorical values to a specific marker:

markers = {'train': 'X', 'valid':'s'},

The marker codes can be found on the Matplotlib website (https://matplotlib.org/3.1.0/api/markers_api.html).

The final code should look like:

sns.scatterplot(x="y", 
                y="y_pred",
                size="interval_size",            
                data=df,
                alpha=0.65,
                c=interval_size,
                cmap='viridis', 
                hue = 'split',
                s = (interval_size**2)*50,
                style = 'split',
                markers = {'train': 'X', 'valid':'s'},
)

Bubble Plot by Shape per data split type

Data Table

Plot1

Plot 2

Plot Code

Answers (1)

Related Questions