Reputation: 4817
Background
We have a dataset with the following format in Azure Data Explorer.
sensorid | timestamp | value |
---|---|---|
valve1 | 24-03-2021 | 123 |
valve1 | 23-03-2021 | 234 |
cylinderspeed | 23-03-2021 | 1.2 |
valvestatus | 23-03-2021 | open |
valvestatus | 24-03-2021 | closed |
cylinderspeed | 25-03-2021 | 2 |
The different sensors have different reporting intervals, some report every second, some a few times per day.
By using this query
datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
[datetime(2021-03-23), "valve1", 123,
datetime(2021-03-24), "valve1", 234,
datetime(2021-03-23), "cylinderspeed", 1.2,
datetime(2021-03-23), "valvestatus", "open",
datetime(2021-03-24), "valvestatus", "closed",
datetime(2021-03-25), "cylinderspeed", 2]
| summarize average=any(value) by bin(sourcetimestamp, 1s), sensorid
| evaluate pivot(sensorid, any(average))
I can generate this table
timestamp | valve1 | cylinderspeed | valvestatus |
---|---|---|---|
23-03-2021 | 123 | 1,2 | open |
24-03-2021 | 234 | closed | |
25-03-2021 | 2 |
The problem
How can I continue on the above query so I can fill empty cells with the previous value for from that column?
Upvotes: 2
Views: 6272
Reputation: 171
You can use the scan
operator for this. This is copied from the official documentation.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "",
2m, "B",
3m, "",
4m, "",
6m, "C",
8m, "",
11m, "D",
12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with
(
step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)
Upvotes: 0
Reputation: 141
You can use the python plugin for that. You first need to enable the plugin in the Azure Portal in the configuration tab of your cluster. I don't know why the Kusto team is taking so long for such a simple operation. I'm waiting for a generic "ffill" since years:
.create-or-alter function ffill(T:(*)) {
T
| evaluate python(
typeof(*),
'result = df.ffill()'
)
}
datatable (num1: int, num2: int)
[
1,0,
int(null), 1,
int(null), 0,
2, int(null),
3, int(null)
]
| invoke ffill()
Upvotes: 0
Reputation: 7608
You can use one of the series_fill functions such as series_fill_forward. Note that the easiest way to get the arrays to fill is by using the make-series operator
since timeseries expects numeric values in the series I translated the enum of the valvestatus to double.
datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
[datetime(2021-03-23), "valve1", 123,
datetime(2021-03-24), "valve1", 234,
datetime(2021-03-23), "valvestatus", "open",
datetime(2021-03-24), "valvestatus", "closed",
datetime(2021-03-23), "cylinderspeed", 1.2,
datetime(2021-03-24), "cylinderspeed", 2]
| extend value = case(value=="open", double(1), value=="closed", double(0), value)
| make-series values = any(value) default=double(null) on sourcetimestamp from(datetime(2021-03-23 00:00:00.0000000)) to(datetime(2021-03-24 00:00:00.0000000)) step 1h by sensorid
| extend values = series_fill_forward(values)
Upvotes: 3