RedTornado
RedTornado

Reputation: 73

How to plot/manage 2 column categorical data using pandas/matplot lib?

I have a dataset representing a bunch of posts. Each post can have any of 4 categories and 6 results.

What I want to do is see how many results are of all the 6 types for each of the 4 categories.

I used

df = df.groupby(["Category", "Result"]).size().reset_index(name='Count')

To get a 3 column dataframe w/ the necessary counts. What I want to do is plot a multiple bar graph for all the categories, such that the xticks are the categories, and each category has 6 bars for all the results.

How can I achieve this?

Upvotes: 4

Views: 3947

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339200

It could be a good idea to create a pivot table from the dataframe. The resulting table can easily be plotted using the built-in plot functionality.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

cats = np.array([l for l in "ABCD"], dtype=str)
cats = np.random.choice(cats, 100, p=[0.3, 0.1, 0.4, 0.2])

res = np.random.choice(np.arange(1,7), 100, p=[0.2, 0.1, 0.08, 0.16,0.26,0.2])
df = pd.DataFrame({"Category":cats, "Result":res})
df2 = df.groupby(["Category", "Result"]).size().reset_index(name='Count')


df3 = pd.pivot_table(df2,  values='Count',  columns=['Result'],  index = "Category",
                         aggfunc=np.sum,  fill_value=0)
df4 = pd.pivot_table(df2,  values='Count',  columns=['Category'],  index = "Result",
                         aggfunc=np.sum,  fill_value=0)

fig, ax = plt.subplots(1,2, figsize=(10,4))
df3.plot(kind="bar", ax=ax[0])
df4.plot(kind="bar", ax=ax[1]) 

plt.show()

enter image description here

Upvotes: 4

Related Questions