Reputation: 1891
So I had the idea to using a single Pandas plot to show two different datum, one in Y axis and the other as the point size, but I wanted to categorize them, i.e., the X axis is not a numerical value but some categories. I'll start by illustrating my two example dataframes:
earnings:
DayOfWeek Hotel Bar Pool
0 Sunday 41 32 15
1 Monday 45 38 24
2 Tuesday 42 32 27
3 Wednesday 45 37 23
4 Thursday 47 34 26
5 Friday 43 30 19
6 Saturday 48 30 28
and
tips:
DayOfWeek Hotel Bar Pool
0 Sunday 7 8 6
1 Monday 9 7 5
2 Tuesday 5 4 1
3 Wednesday 8 6 7
4 Thursday 4 5 10
5 Friday 3 1 1
6 Saturday 10 2 6
Earnings is the total earnings in the hotel, the bar and the pool, and tips is the average tip value in the same locations. I'll post my code as an answer, please fell free to improve/update.
Cheers!
See also: Customizing Plot Legends
Upvotes: 2
Views: 3623
Reputation: 2365
This is the kind of plot that is suited for a grammar of graphics.
import pandas as pd
from plotnine import *
# Create data
s1 = StringIO("""
DayOfWeek Hotel Bar Pool
0 Sunday 41 32 15
1 Monday 45 38 24
2 Tuesday 42 32 27
3 Wednesday 45 37 23
4 Thursday 47 34 26
5 Friday 43 30 19
6 Saturday 48 30 28
""")
s2 = StringIO("""
DayOfWeek Hotel Bar Pool
0 Sunday 7 8 6
1 Monday 9 7 5
2 Tuesday 5 4 1
3 Wednesday 8 6 7
4 Thursday 4 5 10
5 Friday 3 1 1
6 Saturday 10 2 6
""")
# Read data
earnings = pd.read_csv(s1, sep="\s+")
tips = pd.read_csv(s2, sep="\s+")
# Make tidy data
kwargs = dict(value_vars=['Hotel', 'Bar', 'Pool'], id_vars=['DayOfWeek'], var_name='location')
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
earnings = pd.melt(earnings, value_name='earnings', **kwargs)
tips = pd.melt(tips, value_name='tip', **kwargs)
df = pd.merge(earnings, tips, on=['DayOfWeek', 'location'])
df['DayOfWeek'] = pd.Categorical(df['DayOfWeek'], categories=days, ordered=True)
# Create plot
p = (ggplot(df)
+ geom_point(aes('DayOfWeek', 'earnings', color='location', size='tip'))
)
print(p)
Upvotes: 2
Reputation: 1891
Here's the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
earnings = pd.read_csv('earnings.csv', sep=';')
tips = pd.read_csv('tips.csv', sep=';')
print(earnings)
print(tips)
earnings['index'] = earnings.index
height, width = earnings.shape
cols = list(earnings.columns.values)
colors = ['r', 'g', 'b']
# Thanks for
# https://stackoverflow.com/questions/43812911/adding-second-legend-to-scatter-plot
plt.rcParams["figure.subplot.right"] = 0.8
plt.figure(figsize=(8,4))
# get axis
ax = plt.gca()
# plot each column, each row will be in a different X coordinate, creating a category
for x in range(1, width-1):
earnings.plot.scatter(x='index', y=cols[x], label=None,
xticks=earnings.index, c=colors[x-1],
s=tips[cols[x]].multiply(10), linewidth=0, ax=ax)
# This second 'dummy' plot is to create the legend. If we use the one above,
# [enter image description here][1]the circles in the legend might have different sizes
for x in range(1,width-1):
earnings.loc[:1].plot.scatter([], [], label=cols[x], c=colors[x-1], s=30,
linewidth=0, ax=ax)
# Label the X ticks with the categories' names
ax.set_xticklabels(earnings.loc[:,'DayOfWeek'])
ax.set_ylabel("Total Earnings")
ax.set_xlabel("Day of Week")
leg = plt.legend(title="Location", loc=(1.03,0))
ax.add_artist(leg)
# Create a second legent for the points' scale.
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(1,10, 2)]
plt.legend(handles=h, labels=range(1,10, 2), loc=(1.03,0.5), title="Avg. Tip")
plt.show()
# See also:
# https://jakevdp.github.io/PythonDataScienceHandbook/04.06-customizing-legends.html
Upvotes: 2