bet_bit
bet_bit

Reputation: 58

Problems plotting time-series interactively with Altair

Description of the problem

My goal is quite basic: to plot time series in an interactive plot. After some research I decided to give a try to Altair. There are already QGIS plugins for time-series visualisation, but as far as I'm aware, none for plotting time-series at vector-level, interactively clicking on a map and selecting a Polygon. So that's why I decided to go for a self-made solution using Altair, maybe combining it with Folium to add functionalities later on.

I'm totally new to the Altair library (as well as Vega and Vega-lite), and quite new in datascience and data visualisation as well... so apologies in advance for my ignorance!

There are already well explained tutorials on how to plot time series with Altair (for example here, or in the official website). However, my study case has some particularities that, as far as I've seen, have not yet been approached altogether.

The data is produced using the Python API for Google Earth Engine and preprocessed with Python and the pandas/geopandas libraries:

Data overview after preprocessing:

Data overview after preprocessing

My "final" goal would be to show, in the same graphic, an interactive line plot with a set of lines representing each one an agricultural parcel, with parcels categorized by crop types in different colours, e.g. corn in green, wheat in yellow, peer trees in brown... (the information containing the crop type of each parcel can be added to the DataFrame making a join with another DataFrame).

I am thinking of something looking more or less like the following example, with legend's years being the parcels coloured by crop types:

line plot

But so far I haven't managed to make my data look this way... at all.

As you can see there are many nulls in the data (this is due to the application of a cloud masking function and to the fact that there are several Sentinel-2 orbits intersecting the ROI). I would like to just omit the non-null values for earch column/parcel, but I don't know if this data configuration can pose problems (any advice on that?).

So far I got: Unsuccessful attempt

For sure I am doing many things wrong. Would be great to get some advice to solve (some of) them.

Sample of the data and code to reproduce the issue

Here's a text sample of the data in JSON format, and the code used to reproduce the issue is the following:

import pandas as pd
import geopandas as gpd
import altair as alt

df= pd.read_json(r"path\to\json\file.json")
df['date']= pd.to_datetime(df['date'])
print(gdf.dtypes)
df

Output:

out1

lines=alt.Chart(df).mark_line().encode(
    x='date:O',
    y='17811:Q',
    color=alt.Color(
        '17811:Q', scale=alt.Scale(scheme='redyellowgreen', domain=(-1, 1)))
    )
lines.properties(width=700, height=600).interactive()

Output:

out2

Thanks in advance for your help!

Upvotes: 2

Views: 667

Answers (1)

joelostblom
joelostblom

Reputation: 48889

If I understand correctly, it is mostly the format of your dataframe that needs to be changed from wide to long, which you can do either via .melt in pandas or .transform_fold in Altair. With melt, the default names are 'variable' (the previous columns name) and 'value' (the value for each column) for the melted columns:

alt.Chart(df.melt(id_vars='date'), width=500).mark_line().encode(
    x='date:T',
    y='value',
    color=alt.Color('variable')
)

enter image description here

The gaps comes from the NaNs; if you want Altair to interpolate missing values, you can drop the NaNs:

alt.Chart(df.melt(id_vars='date').dropna(), width=500).mark_line().encode(
    x='date:T',
    y='value',
    color=alt.Color('variable')
)

enter image description here

If you want to do it all in Altair, the following is equivalent to the last pandas example above (the transform uses 'key' instead of 'variable' as the name for the former columns). I also use and ordinal instead of nominal type for the color encoding to show how to make the colors more similar to your example.:

alt.Chart(df, width=500).mark_line().encode(
    x='date:T',
    y='value:Q',
    color=alt.Color('key:O')
).transform_fold(
    df.drop(columns='date').columns.tolist()
).transform_filter(
    'isValid(datum.value)'
)

enter image description here

Upvotes: 1

Related Questions