Aparna
Aparna

Reputation: 845

Azure Stream analytics query not returning result set when using timestamp by

I am trying to extract the parts per minute produced where v is an aggregated counter of the parts produced till that time.

My azure SQL Query is as below

select 
x.fqn,
( max(cast(y.arrayvalue.v as BIGINT))-(min(cast(y.arrayvalue.v as BIGINT)))) as ppm
 from
(SELECT
   TS.ArrayIndex,
   TS.ArrayValue.FQN,
   TS.ArrayValue.vqts
FROM
[EventHubInput] as hub    
timestamp by y.arrayvalue.t
CROSS APPLY GetArrayElements(hub.timeseries) AS TS) as x
cross apply GetArrayElements(x.vqts) AS y
where x.fqn like '%Production%' and y.arrayvalue.q=192
group by tumblingwindow(minute,1),x.fqn

My input data looks like this

{
 "timeSeries": [
{
    "fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.StateBasic",
        "vqts":[
                {
                "v": "" ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
},              {
    "fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.ProductionCount",
        "vqts":[
                {
                "v": 264 ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
},              {
    "fqn":".Gateways.GatewayE.CLX.Tags.StateDetailed",
        "vqts":[
                {
                "v": "" ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
}           ]

My query returns no result. when I remove the timestamp by y.arrayvalue.t and add y.arrayvalue.t in the group by clause, I get some result. I realize that maybe this is because I have more than 1 timestamp field for each event, So I wanted to know if it is possible to assign the time data of the first array to timestamp by...something like timestamp by y[0].t

Upvotes: 0

Views: 767

Answers (1)

MinHe-MSFT
MinHe-MSFT

Reputation: 279

As of today, Azure Stream Analytics does not support timestamp by over a value inside an array. So the answer to your question "if it is possible to assign the time data of the first array to timestamp by" is NO.

Here is a workaround you can use:

First, flatten the input message in one job and output to a staging Event Hub:

WITH flattenTS AS
(
    SELECT
       TS.ArrayIndex,
       TS.ArrayValue.FQN,
       TS.ArrayValue.vqts
    FROM [EventHubInput]
    CROSS APPLY GetArrayElements(hub.timeseries) AS TS
)
, flattenVQTS AS
(
    SELECT
        ArrayIndex
        ,FQN
        ,vqts.ArrayValue.v as v
        ,vqts.ArrayValue.q as q
        ,vqts.ArrayValue.t as t
    FROM flattenTS TS
    CROSS APPLY GetArrayElements(TS.vqts) AS vqts
)

SELECT *
INTO [staging_eventhub]
FROM flattenVQTS

Then, use another job to read the flattened messages and do the windowed aggregation:

SELECT
    FQN
    ,MAX(CAST(v as BIGINT))-MIN(CAST(v as BIGINT)) as ppm
FROM [staging_eventhub] timestamp by t
WHERE fqn LIKE '%Production%' and q=192
GROUP BY tumblingwindow(minute,1), fqn

You may wonder can we just combine above two jobs as multiple steps in a single job and avoid the staging Event Hub. Unfortunately, you cannot use "timestamp by" when you select from CTE or subquery today.

Upvotes: 4

Related Questions