Reputation: 2908
Using an example from the docs, I can sort the stacked bars themselves using order
, but I want to see the whole bar along Y-axis sorted via the sum of yield of site
-> Crookston, i.e the blue bar, in ascending/descending order.
Based on this post I tried using transform_calculate
and transform_join_aggregate
, but it doesn't work as expected.
import altair as alt
from vega_datasets import data
source = data.barley()
alt.Chart(source).mark_bar().transform_calculate(
key="datum.site == 'Crookston'"
).transform_joinaggregate(
sort_key="argmax(key)", groupby=['variety']
).transform_calculate(
sort_val='datum.sort_key.value'
).encode(
x=alt.X('sum(yield)', stack='normalize'),
y=alt.Y('variety', sort=alt.SortField('sort_val', order='ascending')),
color='site',
order=alt.Order(
# Sort the segments of the bars by this field
'site',
sort='ascending'
)
)
Expected Output
The bars along Y-axis are sorted by the size of blue (site=Crookston) bar.
Upvotes: 1
Views: 616
Reputation: 25
As demonstrated in jakevdp's answer, you can start by using a Calculate transform to define a new field that copies the yield when the site is "Crookston" and is 0 otherwise. From there, it is not necessary to perform a Join Aggregate transform; SortField
will automatically sum the site yield for Crookston directly in the y-axis sort command by default.
import altair as alt
from vega_datasets import data
source = data.barley()
alt.Chart(source).mark_bar().transform_calculate(
filtered="datum.site == 'Crookston' ? datum.yield : 0"
).encode(
x=alt.X("sum(yield)"),
y=alt.Y(
"variety",
sort=alt.SortField("filtered", order="ascending"),
),
color="site",
order=alt.Order(
# Sort the segments of the bars by this field
"site",
sort="ascending",
),
)
In fact, the previous answer's method using transform_joinaggregate
does not work as expected in general, and only works in this example because the source dataset has the exact same number of records for each variety. For instance, if you add a record of the "Manchuria" variety with a yield of 0 to the Crookston site, that method will now sort Manchuria two places farther down on the y-axis, below Velvet and No. 475, and above No. 462.
source = data.barley()
source = source.append(
{"yield": 0, "variety": "Manchuria", "site": "Crookston"},
ignore_index=True,
)
alt.Chart(source).mark_bar().transform_calculate(
filtered="datum.site == 'Crookston' ? datum.yield : 0"
).transform_joinaggregate(
sort_val="sum(filtered)", groupby=["variety"]
).encode(
x=alt.X("sum(yield)"),
y=alt.Y("variety", sort=alt.SortField("sort_val", order="ascending")),
color="site",
order=alt.Order("site", sort="ascending"),
)
It's visually apparent that the chart is no longer sorted as desired. Adding a yield of zero should not have affected the sort order; the Manchuria variety still has a smaller yield in Crookston than the Velvet and No. 475 varieties.
To see what went wrong, you can open the chart produced by the second code block in Vega Editor. There you will find a table called "data_0" with entries including the following (not in this order):
yield | variety | year | site | filtered | sort_val |
---|---|---|---|---|---|
39.93333 | "Manchuria" | 1931 | "Crookston" | 39.93333 | 72.9 |
32.96667 | "Manchuria" | 1932 | "Crookston" | 32.96667 | 72.9 |
0 | "Manchuria" | null | "Crookston" | 0 | 72.9 |
22.56667 | "Manchuria" | 1932 | "Duluth" | 0 | 72.9 |
41.33333 | "Velvet" | 1931 | "Crookston" | 41.33333 | 73.39999 |
32.06666 | "Velvet" | 1932 | "Crookston" | 32.06666 | 73.39999 |
22.46667 | "Velvet" | 1932 | "Duluth" | 0 | 73.39999 |
48.56666 | "No. 462" | 1931 | "Crookston" | 48.56666 | 79.09999 |
The sort_val
s for Manchuria of 72.9 are less than those for Velvet, as they should be. However, Vega still needs to determine how to aggregate the duplicate values of sort_val
that appear in each row for a given variety. The default behavior for stacked plots is to sum all of the entries in the sort field across the group it is trying to sort (see: https://vega.github.io/vega-lite/docs/sort.html#sort-by-a-different-field), a fact that came in handy in the first code block.
The source data set had 12 entries for each variety to begin. After adding a record, there are now 13 entries of the Manchuria variety, so Manchuria gets a sort value of 72.9 · 13 = 947.7, which is larger than Velvet's sort value of 73.39999 · 12 ≈ 880.8, but still smaller than variety No. 462's sort value of 79.09999 · 12 ≈ 949.2. This reflects what was seen in the second chart.
To fix this, you can specify that only a single sort_val
should be used as the sorting value for each variety, by using EncodingSortField
instead of SortField
, and passing "min"
, "max"
, or "average"
as the aggregation operation to the op
parameter, e.g. sort=alt.EncodingSortField("sort_val", op="min", order="ascending")
. Or you can use the first method above and skip the Join Aggregate transform.
Upvotes: 0
Reputation: 86300
Each colored bar in your chart represents the sum of all yields within that site and variety, for all years in the dataset. When you use argmax
, you are sorting by a single year's Crookston yield, not the total Crookston yield among all years. You can get the latter with a slightly different transform strategy:
import altair as alt
from vega_datasets import data
source = data.barley()
alt.Chart(source).mark_bar().transform_calculate(
filtered="datum.site == 'Crookston' ? datum.yield : 0"
).transform_joinaggregate(
sort_val="sum(filtered)", groupby=["variety"]
).encode(
x=alt.X('sum(yield)', stack='normalize'),
y=alt.Y('variety', sort=alt.SortField('sort_val', order='ascending')),
color='site',
order=alt.Order(
# Sort the segments of the bars by this field
'site',
sort='ascending'
)
)
The result is correctly sorted by the total yield from Crookston, as you can confirm by removing the normalization in the x
encoding:
alt.Chart(source).mark_bar().transform_calculate(
filtered="datum.site == 'Crookston' ? datum.yield : 0"
).transform_joinaggregate(
sort_val="sum(filtered)", groupby=["variety"]
).encode(
x=alt.X('sum(yield)'),
y=alt.Y('variety', sort=alt.SortField('sort_val', order='ascending')),
color='site',
order=alt.Order(
'site',
sort='ascending'
)
)
Upvotes: 1