Reputation: 333
I have been struggling with plotting a bar with very small and very big values. I have tried to plot a chart with two y-axes.
However, the output cannot represent the data and the green columns at different y-axes do not allow to compare with other data. Is there a solution to change the green column to show the data on 2015 - 2019, with the 2020 data represented on the second y-axis? Or is there any other good solution?
The code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dict_ = {'date': [pd.Timestamp('20150720'),pd.Timestamp('20160720'),pd.Timestamp('20170720'),pd.Timestamp('20180720'),pd.Timestamp('20190720'),pd.Timestamp('20200720')],
'BKNG': [15.22, 6.36, 5.05, 5, 9.3641, -3],
'MCD' : [25.22, 11.36, 7.05, 9, 8.3641, -6],
'YUM' : [52.22, 21.36, 25.05, 26, 21.3641, -1000]
}
df = pd.DataFrame(dict_)
df['date'] = df['date'].dt.year
df.set_index('date',inplace=True)
fig1,ax = plt.subplots(figsize=(10,6))
ax.bar(df.index+0.0, df['BKNG'],width=0.1,label='BKNG')
ax.bar(df.index+0.1, df['MCD'],width=0.1,label='MCD')
plt.grid(True)
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')
ax2 = ax.twinx()
ax2.bar(df.index+0.2, df['YUM'],width=0.1,label='YUM', color='g')
plt.legend(loc=3)
plt.ylabel('YUM')
Upvotes: 1
Views: 1896
Reputation: 16147
You will probably see suggestions to scale your data, which will lose the original units of the measurement. In a lot of cases this is fine, but in a business context it's sometimes not interpretable.
If you want to preserve the unit of measure, you can create two charts, modify the axis ranges, then use some diagonal lines to indicate that there's a gap in the axis. This is a pretty traditional approach to plotting values with such large gaps and tends to be understandable by most people.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dict_ = {'date': [pd.Timestamp('20150720'),pd.Timestamp('20160720'),pd.Timestamp('20170720'),pd.Timestamp('20180720'),pd.Timestamp('20190720'),pd.Timestamp('20200720')],
'BKNG': [15.22, 6.36, 5.05, 5, 9.3641, -3],
'MCD' : [25.22, 11.36, 7.05, 9, 8.3641, -6],
'YUM' : [52.22, 21.36, 25.05, 26, 21.3641, -1000]
}
df = pd.DataFrame(dict_)
df['date'] = df['date'].dt.year
df = df.set_index('date').stack().reset_index()
df.columns = ['date','ticker','value']
f, ax = plt.subplots(2, 1, sharex=True, figsize=(10,6))
f.tight_layout()
# plot the same data on both axes
sns.barplot(x='date',y='value',hue='ticker',data=df, ax=ax[0], )
ax[0].set_ylim(-100,100)
sns.barplot(x='date',y='value',hue='ticker',data=df, ax=ax[1])
ax[1].set_ylim(-1050,-900)
ax[0].spines['bottom'].set_visible(False)
ax[0].set(xlabel='', ylabel='')
ax[0].spines['top'].set_visible(False)
ax[0].xaxis.tick_top()
ax[0].tick_params(labeltop='off') # don't put tick labels at the top
ax[1].xaxis.tick_bottom()
ax[1].legend([],[], frameon=False)
d = .01 # how big to make the diagonal lines in axes coordinates
# arguments to pass to plot, just so we don't keep repeating them
kwargs = dict(transform=ax[0].transAxes, color='k', clip_on=False)
ax[0].plot((-d, +d), (-d, +d), **kwargs) # top-left diagonal
ax[0].plot((1 - d, 1 + d), (-d, +d), **kwargs) # top-right diagonal
kwargs.update(transform=ax[1].transAxes) # switch to the bottom axes
ax[1].plot((-d, +d), (1 - d, 1 + d), **kwargs) # bottom-left diagonal
ax[1].plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs) # bottom-right diagonal
# What's cool about this is that now if we vary the distance between
# ax and ax2 via f.subplots_adjust(hspace=...) or plt.subplot_tool(),
# the diagonal lines will move accordingly, and stay right at the tips
# of the spines they are 'breaking'
plt.show()
Upvotes: 2
Reputation: 756
If you want to use the first y-axis to show data until 2019 and second y-axis to show 2020 data you can use:
fig1,ax = plt.subplots(figsize=(10,6))
ax.bar(df.index[:-1]+0.0, df.iloc[:-1,0],width=0.1,label='BKNG')
ax.bar(df.index[:-1]+0.1, df.iloc[:-1,1],width=0.1,label='MCD')
ax.bar(df.index[:-1]+0.2, df.iloc[:-1,2],width=0.1,label='YUM')
plt.grid(True)
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')
ax2 = ax.twinx()
ax2.bar(df.index[-1]+0.0, df.iloc[-1,0],width=0.1,label='BKNG')
ax2.bar(df.index[-1]+0.1, df.iloc[-1,1],width=0.1,label='MCD')
ax2.bar(df.index[-1]+0.2, df.iloc[-1,2],width=0.1,label='YUM', color='green')
ax2.set_ylim((-1001,10))
ax2.set_yscale('symlog')
Alternatively, I would use symlog for all the data as:
fig1,ax = plt.subplots(figsize=(10,6))
ax.bar(df.index+0.0, df.iloc[:,0],width=0.1,label='BKNG')
ax.bar(df.index+0.1, df.iloc[:,1],width=0.1,label='MCD')
ax.bar(df.index+0.2, df.iloc[:,2],width=0.1,label='YUM')
plt.grid(True)
ax.set_yscale('symlog')
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')
Upvotes: 1