Python: How can I get a bar chart overview showing distinct values of a data frame?

Question

I get an overview of all distinct values from a data frame with this lambda function:

overview = df.apply(lambda col: col.unique())

Which returns the desired result like that:

ColA            [1,2,3,...]
ColB            [4,5,6,7,8,9...]
ColC            [A,B,C]
...             ...

How can I visualize this result using subplots / multiple bar plots?

My first attempt was just throwing the object into the plot method of dataframe, which apparantly not works. So I tried to create a dataframe out of the object:

overview = {}
for attr, value in overview.iteritems():
    overview[attr] = value

df = pd.DataFrame(overview)

The output is:

ValueError: arrays must all be same length

So I'm trying utilizing a list:

overview = []
for attr, value in obj_overview.iteritems():
    overview.append({attr: value})

df = pd.DataFrame(overview)

But the result is a cross-matrix, which has as many rows as columns and row n refers to column n. Which is wrong, too.

How can I get an overview using multiple bar charts / sub plots showing distinct values of a data frame?

There are in fact two possible goals I'd like to achieve:

There are multiple bar charts, where every chart represents one column in the original dataframe. X-axis shows all distinct / unique values, Y-axis shows occurences for each of those values. This is the nice-to-have-option. I know that my current approach cannot cover this. It's based on a similar plugin Alteryx e.g. offers:

This should be possible with my current approach: only one (stacked) bar chart is showing all columnes, where the x-axis shows every column, every respective bar contains all distinct values.

Thanks!

Henry Ecker · Accepted Answer

Separate Plots via value_counts:

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
                   'ColB': [4, 4, 6, 6, 6],
                   'ColC': ['A', 'C', 'C', 'E', 'E']})


for col in df:
    df[col].value_counts().sort_index().plot(kind='bar', rot=0, ylabel='count')
    plt.show()

ColA	ColB	ColC

Single Stacked Plot via melt + crosstab:

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
                   'ColB': [4, 4, 6, 6, 6],
                   'ColC': ['A', 'C', 'C', 'E', 'E']})

overview = df.melt()
overview = pd.crosstab(overview['variable'], overview['value'])

ax = overview.plot(kind='bar', stacked=True, rot=0, ylabel='count')
ax.legend(bbox_to_anchor=(1.2, 1))
plt.tight_layout()
plt.show()

Python: How can I get a bar chart overview showing distinct values of a data frame?

Answers (2)

Related Questions