Ad D
Ad D

Reputation: 191

PYTHON : Best way to reproduce this graph?

I am trying to reproduce something similar to this very nice plot I found online (made on R):

enter image description here

I am trying to identify ways to come with the same results in Python. So far I managed to produce the following using seaborn stripplot, seaborn pointplot and axvline for the median:

enter image description here

Data preprocessing aside (not the same results for now I know), I am wondering how to add the colored line between the median point for each category to the vertical median.

Should I somehow use a lollipop plot from the median value instead of the pointplot ?

EDIT: Thanks to Sheldore's input I used the hlines with the following result:

enter image description here

Full code below:

# create rank
ranks = merged_df.groupby("region")["Value"].mean().fillna(0).sort_values(ascending=True)[::1].index
# for the hlines later
range_plot = range(0,len(ranks))

#Create figure
plt.figure(figsize = (12,7))

# define colors  https://learnui.design/tools/data-color-picker.html#palette
#colors= ['#2a6d85','#198992','#3ba490','#74bc84','#b6cf78','#ffdc7a']
colors= ['#003f5c','#444e86','#955196','#dd5182','#ff6e54','#ffa600']
sns.set_palette(sns.color_palette(colors))
sns.set_context("paper")

# Set the font to be serif, rather than sans
sns.set(font='serif')
# Make the background white, and specify the
# specific font family
sns.set_style("white", {
        "font.family": "serif",
        "font.serif": ["Times", "Palatino", "serif"]})

#Create stripplot
ax = sns.stripplot(x='Value',
              y='region',
              data=merged_df,
              palette=sns.color_palette(colors),
              size=6,
              linewidth=0.4,
              alpha=.15,
              zorder=1,
              order = ranks)
#Create Conditional means
ax = sns.pointplot(x="Value", 
              y="region",
              data=merged_df,
              palette=sns.color_palette(colors),
              scale=2,
              ci=None,
              edgecolors="red",
              linewidth=4,
              order = ranks,
              zorder=3)
# add median line
ax = plt.axvline(merged_df.Value.mean(),
            color='grey',
            linestyle='dashed',
            linewidth=1,
            zorder=0)
plt.text(x=merged_df.Value.mean()+1,
         y=-0.1,
         s= 'Mean: {number:.{digits}f}'.format(number=merged_df.Value.mean(),digits=0))
# Add category line
mean = merged_df.Value.mean()
x_arr = merged_df.groupby("region")["Value"].mean().fillna(0).sort_values(ascending=True)
plt.hlines(y=range_plot,
           xmin=mean,
           xmax=x_arr,
           colors=colors,
           linewidth=3,
           zorder=3)

# Add the title
plt.text(x= 4.2,
         y= -0.65,
         s = '{}'.format(merged_df.Indicator.iloc[0]),
         fontsize = 22)
# We change the aspect of ticks label and labels 
plt.tick_params(axis='both', which='major', labelsize=15)
plt.tick_params(axis='both', which='minor', labelsize=15)
plt.xlabel('Student to teacher ratio',fontsize=15)
plt.ylabel('')

# Add the source
plt.text(x= merged_df.Value.max()-25,
         y= 6.4,
         s = 'Data: UNESCO institute for statistics',fontsize = 12, color = 'grey')

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.savefig("UNESCO.jpeg", transparent=True, dpi=300)

Upvotes: 1

Views: 604

Answers (1)

Sheldore
Sheldore

Reputation: 39062

You have following options among others:

1) Either use a vertical lollipop plot as presented here

2) Or use plt.hlines to draw horizontal lines for each country from the vertical median (24) until the dot as shown here. A modification of the latter example could look something like

import numpy
from matplotlib import pyplot

mean = 24

x_arr = mean - numpy.random.randint(-10, 10, 10)
y_arr = numpy.arange(10)

pyplot.hlines(y_arr, mean, x_arr, color='red')
pyplot.plot(x_arr, y_arr, 'o')  
pyplot.axvline(mean, 0, 1, color='k', linestyle = '--')  
plt.xlim(8, 82)

enter image description here

Upvotes: 1

Related Questions