Make subplots of the histogram in pandas dataframe using matpolot library?

Question

I have the following data separate by tab:

CHROM   ms02g:PI    num_Vars_by_PI  range_of_PI total_haplotypes    total_Vars
1   1,2 60,6    2820,81 2   66
2   9,8,10,7,11 94,78,10,69,25  89910,1102167,600,1621365,636   5   276
3   5,3,4,6 6,12,14,17  908,394,759,115656  4   49
4   17,18,22,16,19,21,20    22,11,3,16,7,12,6   1463,171,149,256,157,388,195    7   77
5   13,15,12,14 56,25,96,107    2600821,858,5666,1792   4   284
7   24,26,29,25,27,23,30,28,31  12,31,19,6,12,23,9,37,25    968,3353,489,116,523,1933,823,2655,331  9   174
8   33,32   53,35   1603,2991338    2   88

I am using this code to build a histogram plots with subplots for each CHROM:

with open(outputdir + '/' + 'hap_size_byVar_'+ soi +'_'+ prefix+'.png', 'wb') as fig_initial:
    fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
    for i, data in hap_stats.iterrows():

        # first convert data to list of integers
        data_i = [int(x) for x in data['num_Vars_by_PI'].split(',')]
        ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
        ax[i].legend()

    plt.xlabel('size of the haplotype (number of variants)')
    plt.ylabel('frequency of the haplotypes')
    plt.suptitle('histogram of size of the haplotype (number of variants) 
'
                 'for each chromosome')
    plt.savefig(fig_initial)

Everything is fine except two problems:

The Y-label frequency of the haplotypes is not adjusted properly in this output plot.

When the data contain only one row (see data below) the subplot are not possible and I get TypeError, even though it should be able to make the subgroup with only one index.

Dataframe with only one line of data:

 CHROM  ms02g:PI    num_Vars_by_PI  range_of_PI total_haplotypes    total_Vars
 2  9,8,10,7,11 94,78,10,69,25  89910,1102167,600,1621365,636   5   276

TypeError :

Traceback (most recent call last):
  File "phase-Extender.py", line 1806, in 
    main()
  File "phase-Extender.py", line 502, in main
    compute_haplotype_stats(initial_haplotype, soi, prefix='initial')
  File "phase-Extender.py", line 1719, in compute_haplotype_stats
    ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
TypeError: 'AxesSubplot' object does not support indexing

How can I fix these two issues ?

Diziet Asahi · Accepted Answer

Your first problem comes from the fact that you are using plt.ylabel() at the end of your loop. pyplot functions act on the current active axes object, which, in this case, is the last one created by subplots(). If you want your label to be centered over your subplots, the easiest might be to create a text object centered vertically in the figure.

# replace plt.ylabel('frequency of the haplotypes') with:
fig.text(.02, .5, 'frequency of the haplotypes', ha='center', va='center', rotation='vertical')

you can play around with the x-position (0.02) until you find a position you're happy with. The coordinates are in figure coordinates, (0,0) is bottom left (1,1) is top right. Using 0.5 as y position ensures the label is centered in the figure.

The second problem is due to the fact that, when numrows=1 plt.subplots() returns directly the axes object, instead of a list of axes. There are two options to circumvent this problem

1 - test whether you have only one line, and then replace ax with a list:

fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
if len(hap_stats)==1:
    ax = [ax]
(...)

2 - use the option squeeze=False in your call to plt.subplots(). As explained in the documentation, using this option will force subplots()to always return a 2D array. Therefore you'll have to modify a bit how you are indexing your axes:

fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True, squeeze=False)
    for i, data in hap_stats.iterrows():
        (...)
        ax[i,0].hist(data_i, label=str(data['CHROM']), alpha=0.5)
        (...)

Make subplots of the histogram in pandas dataframe using matpolot library?

Answers (1)

Related Questions