plot histogram from pandas dataframe using the list values in (column, row) pairs

Question

I want to make a histogram plot (both, overlapped and non-overlayed between chromosomes) from pandas Dataframe with following columns.

my_cols = ['chrom', 'len_PIs']
chrom = pd.Series(['chr1', 'chr2', 'chr3'])
len_of_PIs = pd.Series([[np.random.randint(15, 59, 86)],
                    [np.random.randint(18, 55, 92)],
                    [np.random.randint(25, 61, 98)]])

my_df = pd.DataFrame({'chrom': chrom,
                'len_PIs': len_of_PIs},
                 columns=my_cols)

print('
here is df5')
print(df5)
print(type(df5))
print(type(df5['len_PIs']))

here is df5
  chrom                                            len_PIs
0  chr1  [[18, 45, 33, 58, 48, 47, 45, 39, 42, 46, 48, ...
1  chr2  [[45, 32, 49, 46, 53, 40, 46, 35, 44, 24, 51, ...
2  chr3  [[53, 32, 35, 35, 49, 31, 57, 42, 46, 49, 49, ...

So, now I want to make the histogram for each chrom usig the len_PIs value.

import matplotlib.pyplot as plt

with open('histogram_byChr.png', 'w'):
    fig = plt.figure()
    plt.subplot()
    plt.xlabel('chrom')
    plt.ylabel('len_PIs')
    fig.suptitle('length of PIs distribution for each chromosome')

    # these two method (below) are close but don't work

    plt.plot(my_df.groupby('chrom')['len_PIs'])
    # error message which doesn't make sense to me
    ValueError: could not convert string to float: 'chr3'

    my_df.groupby('chrom').plot.hist(alpha=0.5)
    # Error message
    TypeError: Empty 'DataFrame': no numeric data to plot

ImportanceOfBeingErnest · Accepted Answer

The data seems rather unusually stored in the dataframe. Yet you may just iterate over it and plot the respective histograms.

## Plot all three histograms in a single plot
fig, ax = plt.subplots()
for i, data in my_df.iterrows():
    ax.hist(data["len_PIs"], label=data['chrom'], alpha=.5)
ax.legend()
plt.show()

## Plot each histogram in its own subplot
fig, axes = plt.subplots(nrows=len(my_df), sharex=True)
for i, data in my_df.iterrows():
    axes[i].hist(data["len_PIs"], label=data['chrom'], alpha=.5)
    axes[i].legend()
plt.show()

plot histogram from pandas dataframe using the list values in (column, row) pairs

Answers (2)

Related Questions