janicewww
janicewww

Reputation: 333

Representing a large number when plotting a bar chart with Matplotlib

I have been struggling with plotting a bar with very small and very big values. I have tried to plot a chart with two y-axes.

However, the output cannot represent the data and the green columns at different y-axes do not allow to compare with other data. Is there a solution to change the green column to show the data on 2015 - 2019, with the 2020 data represented on the second y-axis? Or is there any other good solution?

The code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dict_ = {'date': [pd.Timestamp('20150720'),pd.Timestamp('20160720'),pd.Timestamp('20170720'),pd.Timestamp('20180720'),pd.Timestamp('20190720'),pd.Timestamp('20200720')],
            'BKNG': [15.22, 6.36, 5.05, 5, 9.3641, -3],
            'MCD' : [25.22, 11.36, 7.05, 9, 8.3641, -6],
            'YUM' : [52.22, 21.36, 25.05, 26, 21.3641, -1000]
    
}

df = pd.DataFrame(dict_)
df['date'] = df['date'].dt.year
df.set_index('date',inplace=True)

fig1,ax = plt.subplots(figsize=(10,6))

ax.bar(df.index+0.0, df['BKNG'],width=0.1,label='BKNG')
ax.bar(df.index+0.1, df['MCD'],width=0.1,label='MCD')
plt.grid(True)
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')

ax2 = ax.twinx()
ax2.bar(df.index+0.2, df['YUM'],width=0.1,label='YUM', color='g')
plt.legend(loc=3)
plt.ylabel('YUM')

output

Upvotes: 1

Views: 1896

Answers (2)

Chris
Chris

Reputation: 16147

You will probably see suggestions to scale your data, which will lose the original units of the measurement. In a lot of cases this is fine, but in a business context it's sometimes not interpretable.

If you want to preserve the unit of measure, you can create two charts, modify the axis ranges, then use some diagonal lines to indicate that there's a gap in the axis. This is a pretty traditional approach to plotting values with such large gaps and tends to be understandable by most people.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dict_ = {'date': [pd.Timestamp('20150720'),pd.Timestamp('20160720'),pd.Timestamp('20170720'),pd.Timestamp('20180720'),pd.Timestamp('20190720'),pd.Timestamp('20200720')],
            'BKNG': [15.22, 6.36, 5.05, 5, 9.3641, -3],
            'MCD' : [25.22, 11.36, 7.05, 9, 8.3641, -6],
            'YUM' : [52.22, 21.36, 25.05, 26, 21.3641, -1000]
    
}

df = pd.DataFrame(dict_)
df['date'] = df['date'].dt.year

df = df.set_index('date').stack().reset_index()
df.columns = ['date','ticker','value']


f, ax = plt.subplots(2, 1, sharex=True, figsize=(10,6))
f.tight_layout()
# plot the same data on both axes
sns.barplot(x='date',y='value',hue='ticker',data=df, ax=ax[0], )
ax[0].set_ylim(-100,100)
sns.barplot(x='date',y='value',hue='ticker',data=df, ax=ax[1])
ax[1].set_ylim(-1050,-900)

ax[0].spines['bottom'].set_visible(False)
ax[0].set(xlabel='', ylabel='')
ax[0].spines['top'].set_visible(False)
ax[0].xaxis.tick_top()
ax[0].tick_params(labeltop='off')  # don't put tick labels at the top
ax[1].xaxis.tick_bottom()
ax[1].legend([],[], frameon=False)
d = .01  # how big to make the diagonal lines in axes coordinates
# arguments to pass to plot, just so we don't keep repeating them
kwargs = dict(transform=ax[0].transAxes, color='k', clip_on=False)
ax[0].plot((-d, +d), (-d, +d), **kwargs)        # top-left diagonal
ax[0].plot((1 - d, 1 + d), (-d, +d), **kwargs)  # top-right diagonal

kwargs.update(transform=ax[1].transAxes)  # switch to the bottom axes
ax[1].plot((-d, +d), (1 - d, 1 + d), **kwargs)  # bottom-left diagonal
ax[1].plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs)  # bottom-right diagonal

# What's cool about this is that now if we vary the distance between
# ax and ax2 via f.subplots_adjust(hspace=...) or plt.subplot_tool(),
# the diagonal lines will move accordingly, and stay right at the tips
# of the spines they are 'breaking'

plt.show()

enter image description here

Upvotes: 2

solopiu
solopiu

Reputation: 756

If you want to use the first y-axis to show data until 2019 and second y-axis to show 2020 data you can use:

fig1,ax = plt.subplots(figsize=(10,6))

ax.bar(df.index[:-1]+0.0, df.iloc[:-1,0],width=0.1,label='BKNG')
ax.bar(df.index[:-1]+0.1, df.iloc[:-1,1],width=0.1,label='MCD')
ax.bar(df.index[:-1]+0.2, df.iloc[:-1,2],width=0.1,label='YUM')
plt.grid(True)
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')

ax2 = ax.twinx()
ax2.bar(df.index[-1]+0.0, df.iloc[-1,0],width=0.1,label='BKNG')
ax2.bar(df.index[-1]+0.1, df.iloc[-1,1],width=0.1,label='MCD')
ax2.bar(df.index[-1]+0.2, df.iloc[-1,2],width=0.1,label='YUM', color='green')
ax2.set_ylim((-1001,10))
ax2.set_yscale('symlog')

enter image description here Alternatively, I would use symlog for all the data as:

fig1,ax = plt.subplots(figsize=(10,6))

ax.bar(df.index+0.0, df.iloc[:,0],width=0.1,label='BKNG')
ax.bar(df.index+0.1, df.iloc[:,1],width=0.1,label='MCD')
ax.bar(df.index+0.2, df.iloc[:,2],width=0.1,label='YUM')
plt.grid(True)
ax.set_yscale('symlog')
plt.legend()
plt.xlabel('date')
plt.ylabel('value')
plt.title('ROE')

enter image description here

Upvotes: 1

Related Questions