keenan
keenan

Reputation: 504

How to turn x-axis values into a legend for matplotlib bar graph

I am creating bar graphs for data that comes from series. However the names (x-axis values) are extremely long. If they are rotated 90 degrees it is impossible to read the entire name and get a good image of the graph. 45 degrees is not much better. I am looking for a way to label the x-axis by numbers 1-15 and then have a legend listing the names that correspond to each number.

This is the completed function I have so far, including creating the series from a larger dataframe

def graph_average_expressions(TAD_matches, CAGE): 
"""graphs the top 15 expression levels of each lncRNA"""

for i, row in TAD_matches.iterrows():
    mask = (
        CAGE['short_description'].isin(row['peak_ID'])
    )#finds expression level for peaks in an lncRNA
    average = CAGE[mask].iloc[:,8:].mean(axis=0).astype('float32').sort_values().tail(n=15)
    #made a new df of the top 15 highest expression levels for all averaged groups 
    #a group is peaks belong to the same lncRNA
    cell_type = list(average.index)
    expression = list(average.values)
    average_df = pd.DataFrame(
        list(zip(cell_type, expression)), 
        columns=['cell_type','expression']
    )
    colors = sns.color_palette(
        'husl', 
        n_colors=len(cell_type)
    )
    p = sns.barplot(
        x=average_df.index, 
        y='expression', 
        data=average_df, 
        palette=colors
    )
    cmap = dict(zip(average_df.cell_type, colors))
    patches = [Patch(color=v, label=k) for k, v in cmap.items()]
    plt.legend(
        handles=patches, 
        bbox_to_anchor=(1.04, 0.5), 
        loc='center left', 
        borderaxespad=0
    )
    plt.title('expression_levels_of_lncRNA_' + row['lncRNA_name'])
    plt.xlabel('cell_type')
    plt.ylabel('expression')
    plt.show()

Here is an example of the data I am graphing

CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532         1.583428
Neutrophils_donor3.CNhs11905                                              1.832527
CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483     1.858384
CD14_monocytes_treated_with_Candida_donor1.CNhs13473                      1.873013
CD14_Monocytes_donor2.CNhs11954                                           2.041607
CD14_monocytes_treated_with_Candida_donor2.CNhs13488                      2.112112
CD14_Monocytes_donor3.CNhs11997                                           2.195365
CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469         2.974203
Eosinophils_donor3.CNhs12549                                              3.566822
CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470           3.685389
CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471                   4.409062
CD14_monocytes_treated_with_Candida_donor3.CNhs13494                      5.546789
CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492       5.673991
Neutrophils_donor1.CNhs10862                                              8.352045
Neutrophils_donor2.CNhs11959                                             11.595509

With the new code above this is the graph I get, but no legend or title. This is my graph

Upvotes: 0

Views: 5630

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62403

Setup the dataframe

  1. verify the index of the dataframe to be plotted is reset, so it's integers beginning at 0, and use the index as the x-axis
  2. plot the values on the y-axis

Option 1A: Seaborn hue

  • The easiest way is probably to use seaborn.barplot and use the hue parameter with the 'names'
  • Seaborn: Choosing color palettes
    • This plot is using husl
    • Additional options for the husl palette can be found at seaborn.husl_palette
  • The bars will not be centered for this option, because they are placed according to the number of hue levels, and there are 15 levels in this case.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True

# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))

# plot
p = sns.barplot(x=df.index, y='values', data=df, hue='names')

# place the legend to the right of the plot
plt.legend(bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)

enter image description here

Option 1B: Seaborn palette

  • Using the palette parameter instead of hue, places the bars directly over the ticks.
  • This option requires "manually" associating 'names' with the colors and creating the legend.
    • patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Patch

# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))

# plot
p = sns.barplot(x=df.index, y='values', data=df, palette=colors)

# create color map with colors and df.names
cmap = dict(zip(df.names, colors))

# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]

# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)

enter image description here

Option 2: pandas.DataFrame.plot

  • This option also requires "manually" associating 'names' with the palette and creating the legend using Patch.
  • Choosing Colormaps in Matplotlib
    • This plot is using tab20c
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.patches import Patch

# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True

# chose a color map with enough colors for the number of bars
colors = [plt.cm.tab20c(np.arange(len(df)))]

# plot the dataframe
df.plot.bar(color=colors)

# create color map with colors and df.names
cmap = dict(zip(df.names, colors[0]))

# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]

# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)

enter image description here

Reproducible DataFrame

data = {'names': ['CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532', 'Neutrophils_donor3.CNhs11905', 'CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483', 'CD14_monocytes_treated_with_Candida_donor1.CNhs13473', 'CD14_Monocytes_donor2.CNhs11954', 'CD14_monocytes_treated_with_Candida_donor2.CNhs13488', 'CD14_Monocytes_donor3.CNhs11997', 'CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469', 'Eosinophils_donor3.CNhs12549', 'CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470', 'CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471', 'CD14_monocytes_treated_with_Candida_donor3.CNhs13494', 'CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492', 'Neutrophils_donor1.CNhs10862', 'Neutrophils_donor2.CNhs11959'],
        'values': [1.583428, 1.832527, 1.858384, 1.873013, 2.041607, 2.1121112, 2.195365, 2.974203, 3.566822, 3.685389, 4.409062, 5.546789, 5.673991, 8.352045, 11.595509]}

df = pd.DataFrame(data)

Upvotes: 1

wwii
wwii

Reputation: 23753

A bit of a different route. Made a string mapping x values to the names and added it to the figure.

Made my own DataFrame for illustration.

from matplotlib import pyplot as plt
import pandas as pd
import string,random
df = pd.DataFrame({'name':[''.join(random.sample(string.ascii_letters,15))
                           for _ in range(10)],
                   'data':[random.randint(1,20) for _ in range(10)]})

Make the plot.

fig,ax = plt.subplots()
ax.bar(df.index,df.data)

Make the legend.

x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))

Add the legend as a Text artist and adjust the plot to accommodate it.

t = ax.text(.7,.2,x_legend,transform=ax.figure.transFigure)
fig.subplots_adjust(right=.65)

plt.show()
plt.close()

enter image description here


That can be made dynamic by getting and using the Text artist's size and the Figure's size.

# using imports and DataFrame from above
fig,ax = plt.subplots()
r = fig.canvas.get_renderer()

ax.bar(df.index,df.data)
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
t = ax.text(0,.1,x_legend,transform=ax.figure.transFigure)

# find the width of the Text and place it on the right side of the Figure
twidth = t.get_window_extent(renderer=r).width
*_,fwidth,fheight = fig.bbox.extents
tx,ty = t.get_position()
tx =  .95 - (twidth/fwidth)
t.set_position((tx,ty))

# adjust the right edge of the plot/Axes
ax_right = tx - .05
fig.subplots_adjust(right=ax_right)

Upvotes: 2

Related Questions