aska
aska

Reputation: 73

Altair bar chart with bars of variable width?

I'm trying to use Altair in Python to make a bar chart where the bars have varying width depending on the data in a column of the source dataframe. The ultimate goal is to get a chart like this one:

A bar chart with bars of variable width

The height of the bars corresponds to a marginal-cost of each energy-technology (given as a column in the source dataframe). The bar width corresponds to the capacity of each energy-technology (also given as a columns in the source dataframe). Colors are ordinal data also from the source dataframe. The bars are sorted in increasing order of marginal cost. (A plot like this is called a "generation stack" in the energy industry). This is easy to achieve in matplotlib like shown in the code below:

import matplotlib.pyplot as plt 

# Make fake dataset
height = [3, 12, 5, 18, 45]
bars = ('A', 'B', 'C', 'D', 'E')

# Choose the width of each bar and their positions
width = [0.1,0.2,3,1.5,0.3]
y_pos = [0,0.3,2,4.5,5.5]

# Make the plot
plt.bar(y_pos, height, width=width)
plt.xticks(y_pos, bars)
plt.show()

(code from https://python-graph-gallery.com/5-control-width-and-space-in-barplots/)

But is there a way to do this with Altair? I would want to do this with Altair so I can still get the other great features of Altair like a tooltip, selectors/bindings as I have lots of other data I want to show alongside the bar-chart.

First 20 rows of my source data looks like this:

enter image description here

(does not match exactly the chart shown above).

Upvotes: 6

Views: 2831

Answers (1)

jakevdp
jakevdp

Reputation: 86533

In Altair, the way to do this would be to use the rect mark and construct your bars explicitly. Here is an example that mimics your data:

import altair as alt
import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame({
    'MarginalCost': 100 * np.random.rand(30),
    'Capacity': 10 * np.random.rand(30),
    'Technology': np.random.choice(['SOLAR', 'THERMAL', 'WIND', 'GAS'], 30)
})

df = df.sort_values('MarginalCost')
df['x1'] = df['Capacity'].cumsum()
df['x0'] = df['x1'].shift(fill_value=0)

alt.Chart(df).mark_rect().encode(
    x=alt.X('x0:Q', title='Capacity'),
    x2='x1',
    y=alt.Y('MarginalCost:Q', title='Marginal Cost'),
    color='Technology:N',
    tooltip=["Technology", "Capacity", "MarginalCost"]
)

enter image description here

To get the same result without preprocessing of the data, you can use Altair's transform syntax:

df = pd.DataFrame({
    'MarginalCost': 100 * np.random.rand(30),
    'Capacity': 10 * np.random.rand(30),
    'Technology': np.random.choice(['SOLAR', 'THERMAL', 'WIND', 'GAS'], 30)
})

alt.Chart(df).transform_window(
    x1='sum(Capacity)',
    sort=[alt.SortField('MarginalCost')]
).transform_calculate(
    x0='datum.x1 - datum.Capacity'
).mark_rect().encode(
    x=alt.X('x0:Q', title='Capacity'),
    x2='x1',
    y=alt.Y('MarginalCost:Q', title='Marginal Cost'),
    color='Technology:N',
    tooltip=["Technology", "Capacity", "MarginalCost"]
)

Upvotes: 8

Related Questions