Reputation: 1212
I have a multiindex DataFrame that looks like the data below. When I plot the data, the graph looks like below.
How can I plot a bar graph, where the color of the bars is determined by my desired category (ex: 'City'). Thus, all bars belonging to the same city have the same color, regardless of the year. For example: In the graph below, all ATL bars should be red, while all MIA bars should be blue.
City ATL MIA \
Year 2010 2011 2012 2010 2011
Taste
Bitter 3159.861983 3149.806667 2042.348937 3124.586470 3119.541240
Sour 1078.897032 3204.689424 3065.818991 2084.322056 2108.568495
Spicy 5280.847114 3134.597728 1015.311288 2036.494136 1001.532560
Sweet 1056.169267 1015.368646 4217.145165 3134.734027 4144.826118
City
Year 2012
Taste
Bitter 1070.925695
Sour 3178.131540
Spicy 3164.382635
Sweet 3173.919338
Below is my code:
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random
matplotlib.style.use('ggplot')
def main():
taste = ['Sweet','Spicy','Sour','Bitter']
store = ['Asian','Italian','American','Greek','Mexican']
df1 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df2 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df3 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df4 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df5 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df6 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df1['Year'] = '2010'
df1['City'] = 'MIA'
df2['Year'] = '2011'
df2['City'] = 'MIA'
df3['Year'] = '2012'
df3['City'] = 'MIA'
df4['Year'] = '2010'
df4['City'] = 'ATL'
df5['Year'] = '2011'
df5['City'] = 'ATL'
df6['Year'] = '2012'
df6['City'] = 'ATL'
DF = pd.concat([df1,df2,df3,df4,df5,df6])
DFG = DF.groupby(['Taste', 'Year', 'City'])
DFGSum = DFG.sum().unstack(['Year','City']).sum(axis=1,level=['City','Year'])
print DFGSum
'''
In my plot, I want the color of the bars to be determined by the "City".
For example: All "ATL" bar colors will be the same regardless of the year.
'''
DFGSum.plot(kind='bar')
plt.show()
if __name__ == '__main__':
main()
Upvotes: 2
Views: 5002
Reputation: 1212
I have found a solution to my own question. I give partial credit to @dermen who originally answered my question. My answer was inspired by his approach.
Although @dermen's solution is correct, I felt I needed a method where I don't have to manually adjust the width of the bars or worry about positions.
The solution below can be adapted to arbitrary amount of cities, and the yearly data belonging to that city. It is important to know that in the solution below, the DataFrame being plotted is a multilevel DataFrame. The solution may break in situations where the DataFrame is sorted, because plotting occurs in a specific order.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random
matplotlib.style.use('ggplot')
taste = ['Sweet','Spicy','Sour','Bitter']
store = ['Asian','Italian','American','Greek','Mexican']
df1 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df2 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df3 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df4 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df5 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df6 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df7 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df8 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df9 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df10 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
'Store':[random.choice(store) for x in range(10)],
'Sold':1000+100*np.random.rand(10)})
df1['Year'] = '2010'
df1['City'] = 'MIA'
df2['Year'] = '2011'
df2['City'] = 'MIA'
df3['Year'] = '2012'
df3['City'] = 'MIA'
df4['Year'] = '2010'
df4['City'] = 'ATL'
df5['Year'] = '2011'
df5['City'] = 'ATL'
df6['Year'] = '2012'
df6['City'] = 'ATL'
df7['Year'] = '2013'
df7['City'] = 'ATL'
df8['Year'] = '2014'
df8['City'] = 'ATL'
df9['Year'] = '2013'
df9['City'] = 'CHI'
df10['Year'] = '2014'
df10['City'] = 'CHI'
DF = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10])
DFG = DF.groupby(['Taste', 'Year', 'City'])
DFGSum = DFG.sum().unstack(['Year','City']).sum(axis=1,level=['City','Year'])
#DFGSum is a multilevel DataFrame
import itertools
color_cycle = itertools.cycle( plt.rcParams['axes.color_cycle'] )
plot_colors = [] #Array for a squenece of colors to be plotted
for city in DFGSum.columns.get_level_values('City').unique():
set_color = color_cycle.next() #Set the color for the city
for year in DFGSum[city].columns.get_level_values('Year').unique():
plot_colors.append(set_color)
#For each unqiue city, all the yearly data belonging to that city will have the same color
DFGSum.plot(kind='bar',color=plot_colors)
# The color pramater of the plot function allows a list of colors sequences to be specified
Upvotes: 3
Reputation: 5362
You will need to specify a few extra args to get it to look nice, but something like this might work
import itertools # for color cycling
# specify the colors you want for each city
color_cycle = itertools.cycle( plt.rcParams['axes.color_cycle'] )
colors = { cty:color_cycle.next() for cty in DF.City.unique() }
#spcify the relative position of each bar
n = len(list(DFGSum))
positions = linspace(-n/2., n/2., n)
# plot each column individually
for i,col in enumerate(list(DFGSum)):
c = colors[col[0]]
pos = positions[i]
DFGSum[col].plot(kind='bar', color=c,
position=pos, width=0.05)
plt.legend()
plt.show()
Though here you cannot tell which bar corresponds to which year...
You can also make a slightly different kind of plot which preserves the year info in the tick labels. This is generalizable to any number of cities and will keep the default color style
df = DFG.sum().reset_index().set_index(['Taste','Year'])
u_cty = df.City.unique() #array(['ATL', 'MIA'], dtype=object)
df_list = []
for cty in u_cty:
d = df.loc[ df.City==cty ]
d = d[['Sold']].rename(columns={'Sold':cty}).reset_index()
df_list.append(d)
df_merged = reduce(lambda left, right: pandas.merge(left, right, on=['Taste','Year'], how='outer'), df_list ) # merge the dataframes
df_merged.set_index(['Taste','Year'], inplace=True)
ATL MIA
Taste Year
Bitter 2010 3211.239754 2070.907629
2011 2158.068222 2145.373251
2012 2138.624730 1062.306874
Sour 2010 4188.024600 NaN
2011 4323.003409 NaN
2012 1042.772615 2136.742869
Spicy 2010 1018.737977 3155.450265
2012 4171.954201 2096.569762
Sweet 2010 2098.679545 5324.078957
2011 4215.376670 2115.964824
2012 3152.998667 5277.410536
Spicy 2011 NaN 6295.032147
df_merged.plot(kind='bar')
Upvotes: 4