FlorianGD
FlorianGD

Reputation: 2436

mark_line in altair/vega-lite reorders the data when x axis contains duplicates

I noticed than when there is a duplicate in the x axis with different values on the y axis, the order in which the data is provided is not taken into account. The maximum value is linked to the point before and the minimum to the next point. This is not what I expect, when creating a CDF (cumulative distribution function) for example.

I tried providing an EncodingSortField with the index, but this doesn't work. I can plot the chart I want by removing the row in the data with the minimum value, but then I need to manually add the point.

Is this by design? Or am I missing something?

Below is a reproducible example.

import pandas as pd
import altair as alt

df = pd.DataFrame({'x':[-1, 0, 0, 1, 2],
                   'y':[-1, 0, 1, 2, 3],
                   'index':[0, 1, 2, 3, 4]})

step = alt.Chart(df).mark_line(interpolate="step", point=True).encode(
    x='x:Q', 
    y='y:Q',
).properties(width=150, 
             height=150, 
             title="interpolate='step'")

step_after = step.mark_line(
    interpolate='step-after', 
    point=True
).properties(title="interpolate=step-after")

step_before = step.mark_line(
    interpolate='step-before', 
    point=True
).properties(title="interpolate=step-before")

sort = step.encode(
    y=alt.Y('y:Q', 
            sort=alt.EncodingSortField(field='index', 
                                       op='sum'))
).properties(title='sort by index')

expected = (step_before.properties(data=df[df.index != 1], 
                                   title='expected') + 
            alt.Chart(pd.DataFrame([{'x':0, 
                                     'y':0}])
                     ).mark_circle().encode(
                x='x:Q', y='y:Q')
           )

(step | step_before | step_after) & (sort | expected)

altair-charts Created on 2018-11-15 by the reprexpy package

import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Darwin-18.2.0-x86_64-i386-64bit (64-bit)
#> Python: 3.6
#> Date: 2018-11-15
#> Packages ------------------------------------------------------------------------
#> altair==2.2.2
#> pandas==0.23.4
#> reprexpy==0.2.1

Thanks.

Upvotes: 2

Views: 611

Answers (1)

jakevdp
jakevdp

Reputation: 86320

The order of the data rows passed into Altair are not preserved in the chart output, and this is by design.

If you want your data entries to be plotted in a particular order, you can use the order encoding to explicitly specify that; an example from the documentation is here: https://altair-viz.github.io/gallery/connected_scatterplot.html

In your case, if you pass order="index:Q" to your list of encodings, I believe the result will be what you expected.

Upvotes: 2

Related Questions