everestial
everestial

Reputation: 7255

plot histogram from pandas dataframe using the list values in (column, row) pairs

I want to make a histogram plot (both, overlapped and non-overlayed between chromosomes) from pandas Dataframe with following columns.

my_cols = ['chrom', 'len_PIs']
chrom = pd.Series(['chr1', 'chr2', 'chr3'])
len_of_PIs = pd.Series([[np.random.randint(15, 59, 86)],
                    [np.random.randint(18, 55, 92)],
                    [np.random.randint(25, 61, 98)]])

my_df = pd.DataFrame({'chrom': chrom,
                'len_PIs': len_of_PIs},
                 columns=my_cols)

print('\nhere is df5')
print(df5)
print(type(df5))
print(type(df5['len_PIs']))

here is df5
  chrom                                            len_PIs
0  chr1  [[18, 45, 33, 58, 48, 47, 45, 39, 42, 46, 48, ...
1  chr2  [[45, 32, 49, 46, 53, 40, 46, 35, 44, 24, 51, ...
2  chr3  [[53, 32, 35, 35, 49, 31, 57, 42, 46, 49, 49, ...
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

So, now I want to make the histogram for each chrom usig the len_PIs value.

import matplotlib.pyplot as plt

with open('histogram_byChr.png', 'w'):
    fig = plt.figure()
    plt.subplot()
    plt.xlabel('chrom')
    plt.ylabel('len_PIs')
    fig.suptitle('length of PIs distribution for each chromosome')

    # these two method (below) are close but don't work

    plt.plot(my_df.groupby('chrom')['len_PIs'])
    # error message which doesn't make sense to me
    ValueError: could not convert string to float: 'chr3'

    my_df.groupby('chrom').plot.hist(alpha=0.5)
    # Error message
    TypeError: Empty 'DataFrame': no numeric data to plot

Upvotes: 0

Views: 3462

Answers (2)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339480

The data seems rather unusually stored in the dataframe. Yet you may just iterate over it and plot the respective histograms.

## Plot all three histograms in a single plot
fig, ax = plt.subplots()
for i, data in my_df.iterrows():
    ax.hist(data["len_PIs"], label=data['chrom'], alpha=.5)
ax.legend()
plt.show()

## Plot each histogram in its own subplot
fig, axes = plt.subplots(nrows=len(my_df), sharex=True)
for i, data in my_df.iterrows():
    axes[i].hist(data["len_PIs"], label=data['chrom'], alpha=.5)
    axes[i].legend()
plt.show()

enter image description here

enter image description here

Upvotes: 1

cs95
cs95

Reputation: 402852

You'll need to do a bit of data reshaping here. Explode your list column into separate columns -

df = pd.DataFrame(
        pd.DataFrame(df.len_PIs.tolist())[0].tolist(), index=df.chrom
)

df    
       0   1   2   3   4   5   6   7   8   9   ...     88    89    90    91  \
chrom                                          ...                            
chr1   58  15  55  53  40  25  49  38  47  34  ...    NaN   NaN   NaN   NaN   
chr2   37  42  24  38  24  46  24  20  46  46  ...   43.0  54.0  44.0  22.0   
chr3   35  37  58  57  58  51  60  50  49  43  ...   37.0  32.0  41.0  54.0   

         92    93    94    95    96    97  
chrom                                      
chr1    NaN   NaN   NaN   NaN   NaN   NaN  
chr2    NaN   NaN   NaN   NaN   NaN   NaN  
chr3   25.0  48.0  40.0  35.0  28.0  28.0  

Next, stack your data horizontally. Finally, call groupby + plot.

df.stack().groupby(level=0).plot.hist(alpha=0.5, legend=True);
plt.show()

enter image description here

Upvotes: 1

Related Questions