Mathias Rönnlund
Mathias Rönnlund

Reputation: 4817

Fill empty fields with previous values in Kusto query in Azure Data Explorer

Background

We have a dataset with the following format in Azure Data Explorer.

sensorid timestamp value
valve1 24-03-2021 123
valve1 23-03-2021 234
cylinderspeed 23-03-2021 1.2
valvestatus 23-03-2021 open
valvestatus 24-03-2021 closed
cylinderspeed 25-03-2021 2

The different sensors have different reporting intervals, some report every second, some a few times per day.

By using this query

datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
    [datetime(2021-03-23), "valve1", 123,
    datetime(2021-03-24), "valve1", 234,
    datetime(2021-03-23), "cylinderspeed", 1.2,
    datetime(2021-03-23), "valvestatus", "open",
    datetime(2021-03-24), "valvestatus", "closed",
    datetime(2021-03-25), "cylinderspeed", 2]
| summarize average=any(value) by bin(sourcetimestamp, 1s), sensorid
| evaluate pivot(sensorid, any(average))

I can generate this table

timestamp valve1 cylinderspeed valvestatus
23-03-2021 123 1,2 open
24-03-2021 234 closed
25-03-2021 2

The problem

How can I continue on the above query so I can fill empty cells with the previous value for from that column?

Upvotes: 2

Views: 6272

Answers (3)

Not_a_programmer
Not_a_programmer

Reputation: 171

You can use the scan operator for this. This is copied from the official documentation.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "",
    2m, "B",
    3m, "",
    4m, "",
    6m, "C",
    8m, "",
    11m, "D",
    12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with 
(
    step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)

Upvotes: 0

Andre
Andre

Reputation: 141

You can use the python plugin for that. You first need to enable the plugin in the Azure Portal in the configuration tab of your cluster. I don't know why the Kusto team is taking so long for such a simple operation. I'm waiting for a generic "ffill" since years:

.create-or-alter function  ffill(T:(*)) {
T
| evaluate python(
typeof(*),
'result = df.ffill()'
)
}

datatable (num1: int, num2: int)
[
1,0,
int(null), 1,
int(null), 0,
2, int(null),
3, int(null)
]
| invoke ffill()

Upvotes: 0

Avnera
Avnera

Reputation: 7608

You can use one of the series_fill functions such as series_fill_forward. Note that the easiest way to get the arrays to fill is by using the make-series operator

since timeseries expects numeric values in the series I translated the enum of the valvestatus to double.

datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
[datetime(2021-03-23), "valve1", 123, 
datetime(2021-03-24), "valve1", 234, 
datetime(2021-03-23), "valvestatus", "open",
datetime(2021-03-24), "valvestatus", "closed",
datetime(2021-03-23), "cylinderspeed", 1.2,
datetime(2021-03-24), "cylinderspeed", 2]
| extend value = case(value=="open", double(1), value=="closed", double(0), value)
| make-series values = any(value) default=double(null) on sourcetimestamp from(datetime(2021-03-23 00:00:00.0000000)) to(datetime(2021-03-24 00:00:00.0000000)) step 1h by sensorid
| extend values = series_fill_forward(values)

Upvotes: 3

Related Questions