Reputation: 929
I get an overview of all distinct values from a data frame with this lambda function:
overview = df.apply(lambda col: col.unique())
Which returns the desired result like that:
ColA [1,2,3,...]
ColB [4,5,6,7,8,9...]
ColC [A,B,C]
... ...
How can I visualize this result using subplots / multiple bar plots?
My first attempt was just throwing the object into the plot method of dataframe, which apparantly not works. So I tried to create a dataframe out of the object:
overview = {}
for attr, value in overview.iteritems():
overview[attr] = value
df = pd.DataFrame(overview)
The output is:
ValueError: arrays must all be same length
So I'm trying utilizing a list:
overview = []
for attr, value in obj_overview.iteritems():
overview.append({attr: value})
df = pd.DataFrame(overview)
But the result is a cross-matrix, which has as many rows as columns and row n refers to column n. Which is wrong, too.
How can I get an overview using multiple bar charts / sub plots showing distinct values of a data frame?
There are in fact two possible goals I'd like to achieve:
Thanks!
Upvotes: 2
Views: 4823
Reputation: 146
This will give you one heatmap for all numerical columns and one for all alphabetical columns, where the colour represents the number of occurrences. It's a different way to plot the info as an alternative.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
col_dict = {
'A': [1,2,3],
'B': [3,4,4,4,5,5,6],
'C': ['A','B','C'],
'D': ['C', 'D', 'D']
}
num_cols = []
num_idx = []
letter_cols = []
letter_idx = []
for col in col_dict:
if isinstance(col_dict[col][0], int):
num_cols += col_dict[col]
num_idx.append(col)
else:
letter_cols += col_dict[col]
letter_idx.append(col)
num_cols = sorted(list(set(num_cols)))
letter_cols = sorted(list(set(letter_cols)))
num_df = pd.DataFrame(0, index=num_idx, columns=num_cols)
letter_df = pd.DataFrame(0, index=letter_idx, columns=letter_cols)
for col in col_dict:
if isinstance(col_dict[col][0], int):
for item in col_dict[col]:
num_df.loc[col, item] += 1
else:
for item in col_dict[col]:
letter_df.loc[col, item] += 1
print(num_df)
print(letter_df)
plt.set_cmap('inferno')
plt.pcolor(num_df)
plt.yticks(np.arange(0.5, len(num_df.index), 1), num_df.index)
plt.xticks(np.arange(0.5, len(num_df.columns), 1), num_df.columns)
plt.colorbar()
plt.xlabel('Counts')
plt.ylabel('Columns')
plt.title('Numerical occurrences')
plt.figure()
plt.pcolor(letter_df)
plt.yticks(np.arange(0.5, len(letter_df.index), 1), letter_df.index)
plt.xticks(np.arange(0.5, len(letter_df.columns), 1), letter_df.columns)
plt.colorbar()
plt.xlabel('Counts')
plt.ylabel('Columns')
plt.title('Aphabetical occurrences')
plt.show()
Upvotes: 1
Reputation: 35686
Separate Plots via value_counts
:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
'ColB': [4, 4, 6, 6, 6],
'ColC': ['A', 'C', 'C', 'E', 'E']})
for col in df:
df[col].value_counts().sort_index().plot(kind='bar', rot=0, ylabel='count')
plt.show()
Single Stacked Plot via melt
+ crosstab
:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
'ColB': [4, 4, 6, 6, 6],
'ColC': ['A', 'C', 'C', 'E', 'E']})
overview = df.melt()
overview = pd.crosstab(overview['variable'], overview['value'])
ax = overview.plot(kind='bar', stacked=True, rot=0, ylabel='count')
ax.legend(bbox_to_anchor=(1.2, 1))
plt.tight_layout()
plt.show()
Upvotes: 6