Reputation: 642
I would like to create a stacked graph bar similar to https://vega.github.io/vega-lite/examples/stacked_bar_weather.html
My dataset contains
The number of countries is significant but I only want to show the top 5 and group the rest under the label OTHER. I want to select the top 5 over the complete time period of one year (not the top 5 per each individual day).
I am able to select the top-5 with the following top-5 transform:
"transform" : [
{
"aggregate" : [ {"op": "sum", "field": "hit", "as": "hit_sum_by_country"} ],
"groupby" : ["country"]
},
{
"window": [ {"op": "row_number", "as": "rank"} ],
"sort": [ {"field": "hit_sum_by_country", "order": "descending"} ]
},
{
"calculate": "datum.rank < 5 ? datum.country : 'others'", "as" : "top_country"
}
],
This, however, munges my original data.
So it looks like I would need to do the top-5 transform on the side
and then somehow inject "top_country"
to the original dataset.
Is this possible? Or should I be doing this outside vega (i.e.
in javascript or server-side)
UPDATE: an example in the vega editor. The chart is hard to read because of the number of countries that are color-coded. I would like to select top-5 and then group (sum) all others under one label/color OTHER.
UPDATE-2 another example linked from the comment
Upvotes: 1
Views: 556
Reputation: 86300
I think you can accomplish what you want to do with the following transform:
"transform": [
{
"joinaggregate": [{"op": "sum", "field": "hits", "as": "hit_sum_by_country"}],
"groupby": ["country"]
},
{
"window": [{"op": "row_number", "as": "rank"}],
"sort": [{"field": "hit_sum_by_country", "order": "descending"}],
"groupby": ["date"]
},
{"calculate": "datum.rank <= 5 ? datum.country : 'Other'", "as": "country"}
],
Note: this pre-supposes that each country has an entry at each date; if not, you'll also have to add an imputation transform before the window transform.
Here's the result (view in editor):
Edit: here's what the transform might look like if you need to impute missing values:
"transform": [
{"impute": "hits", "key": "country", "value": 0, "groupby": ["date"]},
{
"joinaggregate": [
{"op": "sum", "field": "hits", "as": "hit_sum_by_country"}
],
"groupby": ["country"]
},
{
"window": [{"op": "row_number", "as": "rank"}],
"sort": [{"field": "hit_sum_by_country", "order": "descending"}],
"groupby": ["date"]
},
{
"calculate": "datum.rank <= 5 ? datum.country : 'Other'",
"as": "top_country"
}
],
The result looks something like this, using the data from your updated example (view in editor):
Upvotes: 1