Reading JSON in Azure Synapse

Question

I'm trying to understand the code for reading JSON file in Synapse Analytics. And here's the code provided by Microsoft documentation: Query JSON files using serverless SQL pool in Azure Synapse Analytics

select top 10 *
from openrowset(
        bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b'
    ) with (doc nvarchar(max)) as rows
go

I wonder why the format = 'csv'. Is it trying to convert JSON to CSV to flatten the file?

Ronen Ariely · Accepted Answer

Why they didn't just read the file as a SINGLE_CLOB I don't know

When you use SINGLE_CLOB then the entire file is important as one value and the content of the file in the doc is not well formatted as a single JSON. Using SINGLE_CLOB will make us do more work after using the openrowset, before we can use the content as JSON (since it is not valid JSON we will need to parse the value). It can be done but will require more work probably.

The format of the file is multiple JSON's like strings, each in separate line. "line-delimited JSON", as the document call it.

By the way, If you will check the history of the document at GitHub, then you will find that originally this was not the case. As much as I remember, originally the file included a single JSON document with an array of objects (was wrapped with [] after loaded). Someone named "Ronen Ariely" in fact found this issue in the document, which is why you can see my name in the list if the Authors of the document :-)

I wonder why the format = 'csv'. Is it trying to convert json to csv to flatten the hierarchy?

(1) JSON is not a data type in SQL Server. There is no data type name JSON. What we have in SQL Server are tools like functions which work on text and provide support for strings which are JSON's like format. Therefore, we do not CONVERT to JSON or from JSON.

(2) The format parameter has nothing to do with JSON. It specifies that the content of the file is a comma separated values file. You can (and should) use it whenever your file is well formatted as comma separated values file (also commonly known as csv file).

In this specific sample in the document, the values in the csv file are strings, which each one of them has a valid JSON format. Only after you read the file using the openrowset, we start to parse the content of the text as JSON.

Notice that only after the title "Parse JSON documents" in the document, the document starts to speak about parsing the text as JSON.

Reading JSON in Azure Synapse

Answers (1)

Related Questions