Sang Đinh
Sang Đinh

Reputation: 1

How to make stacked plot from the dataframe with categorical columns

I have a DataFrame:

    loan_status  Principal
244     PAIDOFF       1000
245     PAIDOFF       1000
246     PAIDOFF       1000
247     PAIDOFF       1000
248     PAIDOFF       1000
249     PAIDOFF       1000
250     PAIDOFF        800
252     PAIDOFF       1000
253     PAIDOFF       1000
254     PAIDOFF       1000
255     PAIDOFF       1000
256     PAIDOFF        800
257     PAIDOFF       1000
258     PAIDOFF       1000
259     PAIDOFF       1000
260  COLLECTION       1000
261  COLLECTION       1000
262  COLLECTION        800
263  COLLECTION        800
264  COLLECTION        800
265  COLLECTION       1000
266  COLLECTION       1000

and I want the result as

enter image description here

hope to get your help thank you

Upvotes: 0

Views: 955

Answers (2)

Patrick FitzGerald
Patrick FitzGerald

Reputation: 3660

With pandas you can create a crosstab of the two variables which gives you the counts by default. If one of the variables is numerical, an aggregate function can be applied to it. A stacked bar chart can be plotted directly from the table, like in the following example where the 'Principal' values are summed up:

import pandas as pd    # v 1.1.3

# Note that if the 'values' and 'aggfunc' arguments are omitted, the
# table will contain the counts
ctab = pd.crosstab(index=df['Principal'], columns=df['loan_status'],
                   values=df['Principal'], aggfunc='sum')
ctab.plot.bar(stacked=True)

stacked_bars

Upvotes: 0

Trenton McKinney
Trenton McKinney

Reputation: 62523

Use pandas.DataFrame.groupby:

Aggregate by .count:

import pandas as pd
import matplotlib.pyplot as plt

df.groupby(['Principal', 'loan_status'])['loan_status'].count().unstack().plot.bar(stacked=True)
plt.show()

enter image description here

Aggregate by .sum:

df.groupby(['Principal', 'loan_status'])['Principal'].sum().unstack().plot.bar(stacked=True)
plt.show()

enter image description here

Aggregate by .mean:

df.groupby(['Principal', 'loan_status'])['Principal'].mean().unstack().plot.bar(stacked=True)
plt.show()

enter image description here

Upvotes: 2

Related Questions