Reputation: 2669
I have a DataFrame containing a continuous number for the cumulative sum in a column called cont_col
and a category column:
import pandas as pd
import numpy as np
cont = np.random.rand(100)
df = pd.DataFrame(data=cont)
df = df.sort_values(by=0)
df['quartile'] = pd.qcut(df[0], 4, labels=False)
cumsum = df[0].cumsum()
cumsum = cumsum.to_frame()
cumsum[0].plot(kind='bar', color='k')
I would like to plot the same data, but this time coloured by the quartile column.
I can do it with the following code:
def colourise(x):
if x == 0:
return 'k'
elif x == 1:
return 'r'
elif x == 2:
return 'g'
else:
return 'b'
df['colour'] = df['quartile'].apply(colourise)
cumsum = df[0].cumsum()
cumsum = cumsum.to_frame()
cumsum[0].plot(kind='bar', color=df['colour'].tolist())
I just wonder if there is a more general way - in particular, one which didn't depend on the number of qtiles I create.
Upvotes: 0
Views: 944
Reputation: 59549
If you don't particularly care about the colors, create a mapping with one of the seaborn color palettes. This way you just need to specify the column, not the number of categories or colors. If you have many ordered categories, consider switching to a sequential palette.
import seaborn as sns
import matplotlib.pyplot as plt
def map_color(df, col):
color_d = dict(zip(df[col].unique(), sns.color_palette("hls", df[col].nunique())))
df['color'] = df[col].map(color_d)
return df
df = map_color(df, 'quartile')
fig, ax = plt.subplots(figsize=(10, 5))
df.assign(y=df[0].cumsum()).plot(kind='bar', y='y', ax=ax, color=df.color.tolist(),
legend=False)
plt.show()
Upvotes: 1