Reputation: 63
I have a pandas dataframe and I want to summarize/reorganize it to produce a figure. I think what I'm looking for involves groupby
.
Here's what my dataframe df
looks like:
Channel Flag
1 pass
2 pass
3 pass
1 pass
2 pass
3 pass
1 fail
2 fail
3 fail
And this is what I want my dataframe to look like:
Channel pass fail
1 2 1
2 2 1
3 2 1
Running the following code gives something "close", but not in the format I would like:
In [12]: df.groupby(['Channel', 'Flag']).size()
Out[12]:
Channel Flag
1 fail 1
pass 2
2 fail 1
pass 2
3 fail 1
pass 2
Maybe this output is actually fine to make my plot. It's just that I already have the code to plot the data with the previous format. I'm adding the code in case it would be relevant:
df_all = pd.DataFrame()
df_all['All'] = df['Pass'] + df['Fail']
df_pass = df[['Pass']] # The double square brackets keep the column name
df_fail = df[['Fail']]
maxval = max(df_pass.index) # maximum channel value
layout = FastqPlots.make_layout(maxval=maxval)
value_cts = pd.Series(df_pass['Pass'])
for entry in value_cts.keys():
layout.template[np.where(layout.structure == entry)] = value_cts[entry]
sns.heatmap(data=pd.DataFrame(layout.template, index=layout.yticks, columns=layout.xticks),
xticklabels="auto", yticklabels="auto",
square=True,
cbar_kws={"orientation": "horizontal"},
cmap='Blues',
linewidths=0.20)
ax.set_title("Pass reads output per channel")
plt.tight_layout() # Get rid of extra margins around the plot
fig.savefig(out + "/channel_output_all.png")
Any help/advice would be much appreciated. Thanks!
Upvotes: 1
Views: 42
Reputation: 1734
df.groupby(['Channel', 'Flag'],as_index=False).size().pivot('Channel','Flag','size')
Upvotes: 1