Plotly: How to use most recent value until new value available to deal with missing values in a time series?

Question

I have two series in pandas. One has entries for all dates, one has entries sporadically.

When plotting df2['Actual'] in the example below. What's the best way to plot the most recent value at each time point rather than drawing a line between each recorded point. In this example the Actuals line would be drawn at 90 on the y-axis until 2020-06-03 when it would jump to 280.

import pandas as pd
import plotly.graph_objs as go

d1 = {'Index': [1, 2, 3, 4, 5, 6],
     'Time': ["2020-06-01", "2020-06-02", "2020-06-03", "2020-06-04" ,"2020-06-05" ,"2020-06-06"],
     'Pred': [100, -200, 300, -400 , -500, 600]
    }

d2 = {'Index': [1, 2, 3],
     'Time': ["2020-06-01", "2020-06-03","2020-06-06"],
     'Actual': [90, 280, 650]
    }
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

def plot_over_time(df1, df2):
    fig = go.Figure()
    traces = []
    fig.add_trace(dict(
        x=df1['Time'], y=df1['Pred'],
        mode='lines+markers',
        marker=dict(size=10),
        name = "Preds"))    
    fig.add_trace(dict(
        x=df2['Time'], y=df2['Actual'],
        mode='lines+markers',
        marker=dict(size=10),
        name = "Actuals"))
    fig.show()

plot_over_time(df1, df2)

vestland · Accepted Answer

Use line_shape='hv' for each go.Scatter to produce this:

This way, plotly takes care of the visual representation of the data, so there's no need to apply pandas in this case.

Complete code:

import pandas as pd
import plotly.graph_objs as go

d1 = {'Index': [1, 2, 3, 4, 5, 6],
     'Time': ["2020-06-01", "2020-06-02", "2020-06-03", "2020-06-04" ,"2020-06-05" ,"2020-06-06"],
     'Pred': [100, -200, 300, -400 , -500, 600]
    }

d2 = {'Index': [1, 2, 3],
     'Time': ["2020-06-01", "2020-06-03","2020-06-06"],
     'Actual': [90, 280, 650]
    }
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

def plot_over_time(df1, df2):
    fig = go.Figure()
    traces = []
    fig.add_trace(dict(
        x=df1['Time'], y=df1['Pred'],
        mode='lines+markers',
        marker=dict(size=10),
        name = "Preds", line_shape='hv'))    
    fig.add_trace(dict(
        x=df2['Time'], y=df2['Actual'],
        mode='lines+markers',
        marker=dict(size=10),
        name = "Actuals", line_shape='hv'))
    fig.show()

plot_over_time(df1, df2)

Take a look here for more details and other options.

Plotly: How to use most recent value until new value available to deal with missing values in a time series?

Answers (1)

Related Questions