Matt Redlon
Matt Redlon

Reputation: 29

BigQuery Materialized View of a STRUCT

We are trying to create a materialized view of a large BQ table. The table receives a high volume of streaming web activity inserts, is multi-tenant, and really leverages BQ's nested columnar structure.

We want to create a subset of this table for more efficient, near-real time query execution with minimal administrative overhead. We thought the simplest solution would be to create a materialized view which is just a subset of rows (by client) and columns, but currently materialized views require aggregation.

Additionally, the materialized view beta supports a limited set of aggregation functions and does not support sub-selects or UNNEST operations. We have not found a good method of extracting the deeply nested STRUCTs into the materialized view. A simple example:

SELECT 
  '7602E3E96349E972' as session_id,
  '084F0262' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['SAVE50'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99'] as value)
    ] as modifiers
  )] as contexts_transaction
UNION ALL
SELECT 
  '7602E3E96349E972' as session_id,
  '01ECB6EF' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['SPRING','LOVE'] as value), 
      STRUCT(
        'discounts' as name,
        ['14.99','6.99'] as value)
    ] as modifiers
  )] as contexts_transaction
UNION ALL
SELECT 
  '508082BC49BAC09F' as session_id,
  '038B67CF' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['FREESHIP','HOLIDAY25'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99'] as value)
    ] as transaction
  )] as contexts_transaction
UNION ALL
SELECT 
  'C88AE153C784D910' as session_id,
  'EA716BD2' as transaction_id,
  [STRUCT(
    [STRUCT(
      'promotions' as name,
      ['CYBER'] as value), 
      STRUCT(
        'discounts' as name,
        ['9.99','19.99'] as value)
    ] as modifiers
  )]

In that ideally we would retain this STRUCT as is, we are trying to accomplish something like this in the materialized view (recognizing these are not supported MV features):

SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2

We are open to any method of subsetting this massive table, not just MV, but would love it to have the same benefits (low maintenance, automatic, low cost). Any suggestions appreciated!

Upvotes: 1

Views: 1697

Answers (1)

Sabri Karag&#246;nen
Sabri Karag&#246;nen

Reputation: 2365

As far as I could understand from your question, you want to have a similar output to this:

with rawdata AS
(
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select 
  userid,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;

So, input table looks like this
Sample input table

While, output table looks like
Sample output table

If your intention is different, please state it in the question with more details.

For this purpose, I tried to create that query as a materialized view.

create or replace table project.dataset.rawdata as 
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
  SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
  SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;

create materialized view project.dataset.mview as 
select 
  userid,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
  ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid

However, I get the error Unsupported aggregation function in materialized view: array_concat_agg.. Since materialized views are beta yet, we don't know if it's going to be supported in the future. However, it's not possible to do it with current capabilities.

@fhoffa can tell more about it, maybe.

Upvotes: 0

Related Questions