Reputation: 7255
I have the following data separate by tab:
CHROM ms02g:PI num_Vars_by_PI range_of_PI total_haplotypes total_Vars
1 1,2 60,6 2820,81 2 66
2 9,8,10,7,11 94,78,10,69,25 89910,1102167,600,1621365,636 5 276
3 5,3,4,6 6,12,14,17 908,394,759,115656 4 49
4 17,18,22,16,19,21,20 22,11,3,16,7,12,6 1463,171,149,256,157,388,195 7 77
5 13,15,12,14 56,25,96,107 2600821,858,5666,1792 4 284
7 24,26,29,25,27,23,30,28,31 12,31,19,6,12,23,9,37,25 968,3353,489,116,523,1933,823,2655,331 9 174
8 33,32 53,35 1603,2991338 2 88
I am using this code to build a histogram plots with subplots for each CHROM
:
with open(outputdir + '/' + 'hap_size_byVar_'+ soi +'_'+ prefix+'.png', 'wb') as fig_initial:
fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
for i, data in hap_stats.iterrows():
# first convert data to list of integers
data_i = [int(x) for x in data['num_Vars_by_PI'].split(',')]
ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
ax[i].legend()
plt.xlabel('size of the haplotype (number of variants)')
plt.ylabel('frequency of the haplotypes')
plt.suptitle('histogram of size of the haplotype (number of variants) \n'
'for each chromosome')
plt.savefig(fig_initial)
Everything is fine except two problems:
frequency of the haplotypes
is not adjusted properly in this output plot.TypeError
, even though it should be able to make the subgroup with only one index.Dataframe with only one line of data:
CHROM ms02g:PI num_Vars_by_PI range_of_PI total_haplotypes total_Vars
2 9,8,10,7,11 94,78,10,69,25 89910,1102167,600,1621365,636 5 276
TypeError :
Traceback (most recent call last):
File "phase-Extender.py", line 1806, in <module>
main()
File "phase-Extender.py", line 502, in main
compute_haplotype_stats(initial_haplotype, soi, prefix='initial')
File "phase-Extender.py", line 1719, in compute_haplotype_stats
ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
TypeError: 'AxesSubplot' object does not support indexing
How can I fix these two issues ?
Upvotes: 0
Views: 310
Reputation: 40667
Your first problem comes from the fact that you are using plt.ylabel()
at the end of your loop. pyplot functions act on the current active axes object, which, in this case, is the last one created by subplots()
. If you want your label to be centered over your subplots, the easiest might be to create a text object centered vertically in the figure.
# replace plt.ylabel('frequency of the haplotypes') with:
fig.text(.02, .5, 'frequency of the haplotypes', ha='center', va='center', rotation='vertical')
you can play around with the x-position (0.02) until you find a position you're happy with. The coordinates are in figure coordinates, (0,0) is bottom left (1,1) is top right. Using 0.5 as y position ensures the label is centered in the figure.
The second problem is due to the fact that, when numrows=1
plt.subplots()
returns directly the axes object, instead of a list of axes. There are two options to circumvent this problem
1 - test whether you have only one line, and then replace ax
with a list:
fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
if len(hap_stats)==1:
ax = [ax]
(...)
2 - use the option squeeze=False
in your call to plt.subplots()
. As explained in the documentation, using this option will force subplots()
to always return a 2D array. Therefore you'll have to modify a bit how you are indexing your axes:
fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True, squeeze=False)
for i, data in hap_stats.iterrows():
(...)
ax[i,0].hist(data_i, label=str(data['CHROM']), alpha=0.5)
(...)
Upvotes: 2