Damo
Damo

Reputation: 2070

Azure Synapse query to multiple JSON files in data lake, parse data into table format

I want to query RAW telemetry data stored in JSON format in my data lake. I am using Azure Synapse to query the files.

I can successfully pull the data I want into a result table, but I struggle to understand how to take all the rows in the result and apply the JSON -> table transformation across them.

I can successfully do this on 1 row, but when I attempt to do this across multiple, I get the error:

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

I suspect I need a CROSS APPLY, but I'm at the end of my knowledge. Help appreciated.

QUERY

DECLARE @json2 nvarchar(MAX) = (
SELECT TOP 10 *
FROM
    OPENROWSET(
        BULK 'https://xxx.dfs.core.windows.net/raw/100/103/xxxx/2023/10/23/*.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b'
    )
    WITH (
        json2 varchar(MAX)
    ) AS json2
)
SELECT JSON_VALUE(@json2, '$.DeviceId') AS Device, [MeasureType], [MeasureValue],[DataPointTimestamp]
FROM OPENJSON(@json2, '$.MeasureList') WITH (
   [MeasureType] nvarchar(max) '$.MeasureType',
   [MeasureValue] nvarchar(max) '$.MeasureValue',
   [DataPointTimestamp] nvarchar(max) '$.DataPointTimestamp'  
)

DATA Example - 2 rows

╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ {    "DeviceId": "00000",    "MeasureList": [      {        "MeasureType": 24,        "MeasureValue": 23.3,        "DataPointTimestamp": "2023-10-23T09:58:48.2962707+00:00"      },      {        "MeasureType": 7,        "MeasureValue": 5787.0,        "DataPointTimestamp": "2023-10-23T09:58:48.296772+00:00"      },      {        "MeasureType": 3,        "MeasureValue": 2.304,        "DataPointTimestamp": "2023-10-23T09:58:48.296772+00:00"      },      {        "MeasureType": 4,        "MeasureValue": 30.0,        "DataPointTimestamp": "2023-10-23T09:58:48.296772+00:00"      },      {        "MeasureType": 1,        "MeasureValue": 37.0,        "DataPointTimestamp": "2023-10-23T09:58:48.2977765+00:00"      },      {        "MeasureType": 22,        "MeasureValue": 55.0,        "DataPointTimestamp": "2023-10-23T09:58:48.2977765+00:00"      },      {        "MeasureType": 20,        "MeasureValue": 0.0,        "DataPointTimestamp": "2023-10-23T09:58:48.2977765+00:00"      },      {        "MeasureType": 18,        "MeasureValue": 1.0,        "DataPointTimestamp": "2023-10-23T09:58:48.2977765+00:00"      },      {        "MeasureType": 9,        "MeasureValue": -0.112031,        "DataPointTimestamp": "2023-10-23T09:58:48.6572325+00:00"      },      {        "MeasureType": 10,        "MeasureValue": 51.503401,        "DataPointTimestamp": "2023-10-23T09:58:48.6572325+00:00"      },      {        "MeasureType": 8,        "MeasureValue": 28.3871,        "DataPointTimestamp": "2023-10-23T09:58:48.911908+00:00"      },      {        "MeasureType": 17,        "MeasureValue": 49.0,        "DataPointTimestamp": "2023-10-23T09:58:48.911908+00:00"      },      {        "MeasureType": 12,        "MeasureValue": 46369.09568,        "DataPointTimestamp": "2023-10-23T09:58:49.3340307+00:00"      }    ],    "EventList": [],    "InformationList": []  }     ║
║ {    "DeviceId": "0000",    "MeasureList": [      {        "MeasureType": 24,        "MeasureValue": 23.3,        "DataPointTimestamp": "2023-10-23T10:48:48.3482369+00:00"      },      {        "MeasureType": 7,        "MeasureValue": 5811.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3482369+00:00"      },      {        "MeasureType": 3,        "MeasureValue": 2.304,        "DataPointTimestamp": "2023-10-23T10:48:48.3482369+00:00"      },      {        "MeasureType": 4,        "MeasureValue": 30.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3482369+00:00"      },      {        "MeasureType": 1,        "MeasureValue": 34.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3492387+00:00"      },      {        "MeasureType": 22,        "MeasureValue": 55.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3492387+00:00"      },      {        "MeasureType": 20,        "MeasureValue": 0.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3492387+00:00"      },      {        "MeasureType": 18,        "MeasureValue": 1.0,        "DataPointTimestamp": "2023-10-23T10:48:48.3492387+00:00"      },      {        "MeasureType": 9,        "MeasureValue": -0.112204,        "DataPointTimestamp": "2023-10-23T10:48:48.7427858+00:00"      },      {        "MeasureType": 10,        "MeasureValue": 51.503366,        "DataPointTimestamp": "2023-10-23T10:48:48.7427858+00:00"      },      {        "MeasureType": 8,        "MeasureValue": 28.374,        "DataPointTimestamp": "2023-10-23T10:48:48.9999703+00:00"      },      {        "MeasureType": 17,        "MeasureValue": 49.0,        "DataPointTimestamp": "2023-10-23T10:48:48.9999703+00:00"      },      {        "MeasureType": 12,        "MeasureValue": 46369.357824,        "DataPointTimestamp": "2023-10-23T10:48:49.4220929+00:00"      }    ],    "EventList": [],    "InformationList": []  } ║
╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Desired result

╔════════╦═════════════╦═══════════════╦═══════════════════════════════════╗
║ Device ║ MeasureType ║ Measure Value ║        DataPointTimestamp         ║
╠════════╬═════════════╬═══════════════╬═══════════════════════════════════╣
║   0000 ║          11 ║ 128034.0      ║ 2023-10-23T07:28:30.6775066+00:00 ║
║   0000 ║           6 ║ 8258.0        ║ 2023-10-23T07:28:30.6775066+00:00 ║
║   0000 ║          24 ║ 19.6          ║ 2023-10-23T07:28:30.6775066+00:00 ║
║   0000 ║          11 ║ 11.0          ║ 2023-10-23T07:38:30.6775066+00:00 ║
║   0000 ║           6 ║ 99999         ║ 2023-10-23T07:38:30.6775066+00:00 ║
║   0000 ║          24 ║ 29.6          ║ 2023-10-23T07:38:30.6775066+00:00 ║
╚════════╩═════════════╩═══════════════╩═══════════════════════════════════╝

Upvotes: 0

Views: 565

Answers (1)

Aswin
Aswin

Reputation: 7156

The error is in the query that assigns the JSON data to the json2 variable. This subquery is returning multiple rows of data, which is causing the error when you try to use the json2 variable in the subsequent SELECT statement. To solve this error, you can directly read the Json and apply cross apply without storing them in Json variable. Below is the query.

Code

SELECT 
    JSON_VALUE(json2, '$.DeviceId') AS Device, 
    [MeasureType], 
    [MeasureValue],
    [DataPointTimestamp]
FROM 
    OPENROWSET(
        BULK 'https://xxx.dfs.core.windows.net/raw/100/103/xxxx/2023/10/23/*.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b'
    )
    WITH (
        json2 varchar(MAX)
    ) AS json2
CROSS APPLY 
    OPENJSON(json2, '$.MeasureList') WITH (
        [MeasureType] nvarchar(max) '$.MeasureType',
        [MeasureValue] nvarchar(max) '$.MeasureValue',
        [DataPointTimestamp] nvarchar(max) '$.DataPointTimestamp'  
    ) AS j

enter image description here

Output for sample input data:

Device MeasureType MeasureValue DataPointTimestamp
0000 1 34.0 2023-10-23T10:48:48.3492387+00:00
0000 10 51.503366 2023-10-23T10:48:48.7427858+00:00
0000 12 46369.357824 2023-10-23T10:48:49.4220929+00:00
0000 17 49.0 2023-10-23T10:48:48.9999703+00:00
0000 18 1.0 2023-10-23T10:48:48.3492387+00:00
0000 20 0.0 2023-10-23T10:48:48.3492387+00:00
0000 22 55.0 2023-10-23T10:48:48.3492387+00:00
0000 24 23.3 2023-10-23T10:48:48.3482369+00:00
0000 3 2.304 2023-10-23T10:48:48.3482369+00:00
0000 4 30.0 2023-10-23T10:48:48.3482369+00:00
0000 7 5811.0 2023-10-23T10:48:48.3482369+00:00
0000 8 28.374 2023-10-23T10:48:48.9999703+00:00

Upvotes: 4

Related Questions