Scott J
Scott J

Reputation: 25

Can I batch Snowflake json output into separate rows based on LIMIT/OFFSET

I have a query in Snowflake that produces json output in rows. 29,000+ rows. I am using this query to create records in an MDM system. However, this ingestion process is a bit inefficient and costly. I have since taken my final SELECT statement that created the json and placed in a CTE. Then I made a new final SELECT statement

SELECT ARRAY_AGG(*) as clin_prov
FROM clin_prov

This overwhelmed the system in its size. Using LIMIT in the CTE of 1000, and even 5000, produced results, making a large json record, as expected, and desired. I'd like to have 30 rows of arrays with <= 1000 json records in the final output. Is there a way to do that? Can I create Arrays based on LIMIT and OFFSET and feed into separate rows? I

I tried with and without LIMIT in the clin_prov cte. I tried to use separate ctes based on LIMIT and OFFSET then UNION ALL. That overloaded as well.

Upvotes: 2

Views: 57

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 176124

Rows can be grouped into "buckets" of predefined size. Example:

WITH cte AS ( 
    SELECT CEIL(ROW_NUMBER() OVER(ORDER BY NULL) / 30) AS grp, * 
    FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
) 
SELECT grp, ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
FROM cte 
GROUP BY grp;

Upvotes: 1

keithwalsh
keithwalsh

Reputation: 881

Yes you can batch. Try the below, changing the NTILE value for the number of batches.

WITH clin_prov AS (
  SELECT 
    ..., 
    NTILE(50) OVER (ORDER BY some_column) AS batch
  FROM 
    ...
)
SELECT batch, ARRAY_AGG(your_json_column) AS clin_prov
FROM clin_prov
GROUP BY batch
ORDER BY batch

Upvotes: 1

Related Questions