machine_apprentice
machine_apprentice

Reputation: 439

Bubble Plot by Shape per data split type

I havecreated a bubble plot where I've measured true vs predicted label values, I wanted to know if it would be possible to change the plot shapes according to their data split. I want to maintain the colors of my plot per interval_size but just have the shape change according to data split.

Data Table

min            max          y    interval_size   y_pred     split
0.654531    1.021657    0.837415    0.367126    0.838094    train
0.783401    1.261898    1.000000    0.478497    1.022649    valid
-0.166070   0.543749    0.059727    0.709819    0.188840    train
0.493270    1.112610    0.504393    0.619340    0.802940    valid
0.140510    0.572957    0.479063    0.432447    0.356734    train

Plot1

enter image description here

Plot 2

enter image description here

Plot Code

plt.figure(figsize=(16,8))

sns.set_context("talk", font_scale=1.1)
plt.figure(figsize=(10,6))
sns.scatterplot(x="y", 
                y="y_pred",
                size="interval_size",            
                data=df,
                alpha=0.65,
                c=interval_size,
                cmap='viridis', 
                hue = 'split',
                s = (interval_size**2)*50)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.01, 1),borderaxespad=0)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.01, 0.54),  borderaxespad=0.)

#Plot Characteristics
plt.title("True vs Predicted Labels", fontsize = 36)
plt.xlabel("True Labels", fontsize = 25)
plt.ylabel("Predicted Labels", fontsize = 25)

Question:

Validation data would be nice to include, how can I perhaps differentiate by shape, e.g. triangle/circle?

Upvotes: 1

Views: 181

Answers (1)

Timothy Chan
Timothy Chan

Reputation: 503

Seaborn has a lot of in-depth customization packed into simple parameters. For your code, you simply want to add a keyword parameter to your sns.scatterplot() function:

style = 'split',

This will change the markers according to the categorical values, although it will pick the defaults. If you want more control over the specific markers being used, you can pass another parameter to map the categorical values to a specific marker:

markers = {'train': 'X', 'valid':'s'},

The marker codes can be found on the Matplotlib website (https://matplotlib.org/3.1.0/api/markers_api.html).

The final code should look like:

sns.scatterplot(x="y", 
                y="y_pred",
                size="interval_size",            
                data=df,
                alpha=0.65,
                c=interval_size,
                cmap='viridis', 
                hue = 'split',
                s = (interval_size**2)*50,
                style = 'split',
                markers = {'train': 'X', 'valid':'s'},
)

Upvotes: 2

Related Questions