Reputation: 7255
I want to make a histogram plot (both, overlapped and non-overlayed between chromosomes) from pandas Dataframe
with following columns.
my_cols = ['chrom', 'len_PIs']
chrom = pd.Series(['chr1', 'chr2', 'chr3'])
len_of_PIs = pd.Series([[np.random.randint(15, 59, 86)],
[np.random.randint(18, 55, 92)],
[np.random.randint(25, 61, 98)]])
my_df = pd.DataFrame({'chrom': chrom,
'len_PIs': len_of_PIs},
columns=my_cols)
print('\nhere is df5')
print(df5)
print(type(df5))
print(type(df5['len_PIs']))
here is df5
chrom len_PIs
0 chr1 [[18, 45, 33, 58, 48, 47, 45, 39, 42, 46, 48, ...
1 chr2 [[45, 32, 49, 46, 53, 40, 46, 35, 44, 24, 51, ...
2 chr3 [[53, 32, 35, 35, 49, 31, 57, 42, 46, 49, 49, ...
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
So, now I want to make the histogram for each chrom
usig the len_PIs
value.
import matplotlib.pyplot as plt
with open('histogram_byChr.png', 'w'):
fig = plt.figure()
plt.subplot()
plt.xlabel('chrom')
plt.ylabel('len_PIs')
fig.suptitle('length of PIs distribution for each chromosome')
# these two method (below) are close but don't work
plt.plot(my_df.groupby('chrom')['len_PIs'])
# error message which doesn't make sense to me
ValueError: could not convert string to float: 'chr3'
my_df.groupby('chrom').plot.hist(alpha=0.5)
# Error message
TypeError: Empty 'DataFrame': no numeric data to plot
Upvotes: 0
Views: 3462
Reputation: 339480
The data seems rather unusually stored in the dataframe. Yet you may just iterate over it and plot the respective histograms.
## Plot all three histograms in a single plot
fig, ax = plt.subplots()
for i, data in my_df.iterrows():
ax.hist(data["len_PIs"], label=data['chrom'], alpha=.5)
ax.legend()
plt.show()
## Plot each histogram in its own subplot
fig, axes = plt.subplots(nrows=len(my_df), sharex=True)
for i, data in my_df.iterrows():
axes[i].hist(data["len_PIs"], label=data['chrom'], alpha=.5)
axes[i].legend()
plt.show()
Upvotes: 1
Reputation: 402852
You'll need to do a bit of data reshaping here. Explode your list column into separate columns -
df = pd.DataFrame(
pd.DataFrame(df.len_PIs.tolist())[0].tolist(), index=df.chrom
)
df
0 1 2 3 4 5 6 7 8 9 ... 88 89 90 91 \
chrom ...
chr1 58 15 55 53 40 25 49 38 47 34 ... NaN NaN NaN NaN
chr2 37 42 24 38 24 46 24 20 46 46 ... 43.0 54.0 44.0 22.0
chr3 35 37 58 57 58 51 60 50 49 43 ... 37.0 32.0 41.0 54.0
92 93 94 95 96 97
chrom
chr1 NaN NaN NaN NaN NaN NaN
chr2 NaN NaN NaN NaN NaN NaN
chr3 25.0 48.0 40.0 35.0 28.0 28.0
Next, stack
your data horizontally. Finally, call groupby
+ plot
.
df.stack().groupby(level=0).plot.hist(alpha=0.5, legend=True);
plt.show()
Upvotes: 1