Reputation: 75
I did a scatter plot
using seaborn
from three columns ['Category','Installs' and 'Gross Income']
and a hue map using the category column from my dataset. However in the legend, other than the category column which I want to appear, there is a big smug at the end showing one of the columns used in the scatter plot, Installs. I'll like to remove this element, but from searching through other questions hear and the documentation of seaborn
and matplotlib
I'm at a loss on how to proceed.
Here is a snippet of the code I'm working with:
fig, ax = pyplot.subplots(figsize=(12,6))
ax=sns.scatterplot( x="Installs", y="Gross Income", data=comp_income_inst, hue='Category',
palette=sns.color_palette("cubehelix",len(comp_income_inst)),
size='Installs', sizes=(100,5000), legend='brief', ax=ax)
ax.set(xscale="log", yscale="log")
ax.set(ylabel="Average Income")
ax.set_title("Distribution showing the Earnings of Apps in Various Categories\n", fontsize=18)
plt.rcParams["axes.labelsize"] = 15
# Move the legend to an empty part of the plot
plt.legend(loc='upper left', bbox_to_anchor=(-0.2, -0.06),fancybox=True, shadow=True, ncol=5)
#plt.legend(loc='upper left')
plt.show()
Upvotes: 5
Views: 4048
Reputation: 107687
Actually, that is not a smudge but the size legend for your hue map. Because the bubble sizes (100, 5000)
are so large relative to data, they overlap in that space in legend, creating the "smudge" effect. The default legend combines both color and size legends together.
But rather than remove the size markers as you intend, readers may need to know the range Installs size for bubbles. Hence, consider separating one legend into two legends and use borderpad and prop size to fit the bubbles and labels.
Data (seeded, random data)
categs = ['GAME', 'EDUCATION', 'FAMILY', 'WEATHER', 'ENTERTAINMENT', 'PHOTOGRAPHY', 'LIFESTYLE',
'SPORTS', 'PRODUCTIVITY', 'COMMUNICATION', 'PERSONALIZATION', 'HEALTH_AND_FITNESS', 'FOOD_AND_DRINK', 'PARENTING',
'MAPS_AND_NAVIGATION', 'TOOLS', 'VIDEO_PLAYERS', 'BUSINESS', 'AUTO_AND_VEHICLES', 'TRAVEL_AND_LOCAL',
'FINANCE', 'MEDICAL', 'ART_AND_DESIGN', 'SHOPPING', 'NEWS_AND_MAGAZINES', 'SOCIAL', 'DATING', 'BOOKS_AND REFERENCES',
'LIBRARIES_AND_DEMO', 'EVENTS']
np.random.seed(11222018)
comp_income_inst = pd.DataFrame({'Category': categs,
'Installs': np.random.randint(100, 5000, 30),
'Gross Income': np.random.uniform(0, 30, 30) * 100000
}, columns=['Category', 'Installs', 'Gross Income'])
Graph
fig, ax = plt.subplots(figsize=(13,6))
ax = sns.scatterplot(x="Installs", y="Gross Income", data=comp_income_inst, hue='Category',
palette=sns.color_palette("cubehelix",len(comp_income_inst)),
size='Installs', sizes=(100, 5000), legend='brief', ax=ax)
ax.set(xscale="log", yscale="log")
ax.set(ylabel="Average Income")
ax.set_title("Distribution showing the Earnings of Apps in Various Categories\n", fontsize=20)
plt.rcParams["axes.labelsize"] = 15
# EXTRACT CURRENT HANDLES AND LABELS
h,l = ax.get_legend_handles_labels()
# COLOR LEGEND (FIRST 30 ITEMS)
col_lgd = plt.legend(h[:30], l[:30], loc='upper left',
bbox_to_anchor=(-0.05, -0.50), fancybox=True, shadow=True, ncol=5)
# SIZE LEGEND (LAST 5 ITEMS)
size_lgd = plt.legend(h[-5:], l[-5:], loc='lower center', borderpad=1.6, prop={'size': 20},
bbox_to_anchor=(0.5,-0.45), fancybox=True, shadow=True, ncol=5)
# ADD FORMER (OVERWRITTEN BY LATTER)
plt.gca().add_artist(col_lgd)
plt.show()
Output
Even consider seaborn's theme with sns.set()
just before plotting:
Upvotes: 6