Reputation: 267
I'm currently experimenting with pandas and matplotlib.
I have created a Pandas dataframe which stores data like this:
cmc|coloridentity
1 | G
1 | R
2 | G
3 | G
3 | B
4 | B
What I now want to do is to make a stacked bar plot where I can see how many entries per cmc
exist. And I want to do that for all coloridentity
and stack them above.
My thoughts so far:
#get all unique values of coloridentity
unique_values = df['coloridentity'].unique()
#Create two dictionaries. One for the number of entries per cost and one
# to store the different costs for each color
color_dict_values = {}
color_dict_index = {}
for u in unique_values:
temp_df = df['cmc'].loc[df['coloridentity'] == u].value_counts()
color_dict_values[u] = np.array(temp_df)
color_dict_index[u] = temp_df.index.to_numpy()
width = 0.4
p1 = plt.bar(color_dict_index['G'], color_dict_values['G'], width, color='g')
p2 = plt.bar(color_dict_index['R'], color_dict_values['R'], width,
bottom=color_dict_values['G'], color='r')
plt.show()
So but this gives me an error because the line where I say that the bottom of the second plot shall be the values of different plot have different numpy shapes.
Does anyone know a solution? I thought of adding 0 values so that the shapes are the same , but I don't know if this is the best solution, and if yes how the best way would be to solve it.
Upvotes: 3
Views: 346
Reputation: 80279
Working with a fixed index (the range of cmc
values), makes things easier. That way the color_dict_values
of a color_id
give a count for each of the possible cmc
values (stays zero when there are none).
The color_dict_index
isn't needed any more. To fill in the color_dict_values
, we iterate through the temporary dataframe with the value_counts
.
To plot the bars, the x-axis is now the range of possible cmc
values. I added [1:] to each array to skip the zero at the beginning which would look ugly in the plot.
The bottom starts at zero, and gets incremented by the color_dict_values
of the color that has just been plotted. (Thanks to numpy, the constant 0 added to an array will be that array.)
In the code I generated some random numbers similar to the format in the question.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
N = 50
df = pd.DataFrame({'cmc': np.random.randint(1, 10, N), 'coloridentity': np.random.choice(['R', 'G'], N)})
# get all unique values of coloridentity
unique_values = df['coloridentity'].unique()
# find the range of all cmc indices
max_cmc = df['cmc'].max()
cmc_range = range(max_cmc + 1)
# dictionary for each coloridentity: array of values of each possible cmc
color_dict_values = {}
for u in unique_values:
value_counts_df = df['cmc'].loc[df['coloridentity'] == u].value_counts()
color_dict_values[u] = np.zeros(max_cmc + 1, dtype=int)
for ind, cnt in value_counts_df.iteritems():
color_dict_values[u][ind] = cnt
width = 0.4
bottom = 0
for col_id, col in zip(['G', 'R'], ['limegreen', 'crimson']):
plt.bar(cmc_range[1:], color_dict_values[col_id][1:], bottom=bottom, width=width, color=col)
bottom += color_dict_values[col_id][1:]
plt.xticks(cmc_range[1:]) # make sure every cmc gets a tick label
plt.tick_params(axis='x', length=0) # hide the tick marks
plt.xlabel('cmc')
plt.ylabel('count')
plt.show()
Upvotes: 1