Reputation: 2436
I noticed than when there is a duplicate in the x axis with different values on the y axis, the order in which the data is provided is not taken into account. The maximum value is linked to the point before and the minimum to the next point. This is not what I expect, when creating a CDF (cumulative distribution function) for example.
I tried providing an EncodingSortField
with the index, but this doesn't work. I can plot the chart I want by removing the row in the data with the minimum value, but then I need to manually add the point.
Is this by design? Or am I missing something?
Below is a reproducible example.
import pandas as pd
import altair as alt
df = pd.DataFrame({'x':[-1, 0, 0, 1, 2],
'y':[-1, 0, 1, 2, 3],
'index':[0, 1, 2, 3, 4]})
step = alt.Chart(df).mark_line(interpolate="step", point=True).encode(
x='x:Q',
y='y:Q',
).properties(width=150,
height=150,
title="interpolate='step'")
step_after = step.mark_line(
interpolate='step-after',
point=True
).properties(title="interpolate=step-after")
step_before = step.mark_line(
interpolate='step-before',
point=True
).properties(title="interpolate=step-before")
sort = step.encode(
y=alt.Y('y:Q',
sort=alt.EncodingSortField(field='index',
op='sum'))
).properties(title='sort by index')
expected = (step_before.properties(data=df[df.index != 1],
title='expected') +
alt.Chart(pd.DataFrame([{'x':0,
'y':0}])
).mark_circle().encode(
x='x:Q', y='y:Q')
)
(step | step_before | step_after) & (sort | expected)
Created on 2018-11-15 by the reprexpy package
import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Darwin-18.2.0-x86_64-i386-64bit (64-bit)
#> Python: 3.6
#> Date: 2018-11-15
#> Packages ------------------------------------------------------------------------
#> altair==2.2.2
#> pandas==0.23.4
#> reprexpy==0.2.1
Thanks.
Upvotes: 2
Views: 611
Reputation: 86320
The order of the data rows passed into Altair are not preserved in the chart output, and this is by design.
If you want your data entries to be plotted in a particular order, you can use the order
encoding to explicitly specify that; an example from the documentation is here: https://altair-viz.github.io/gallery/connected_scatterplot.html
In your case, if you pass order="index:Q"
to your list of encodings, I believe the result will be what you expected.
Upvotes: 2