Reputation: 29
We are trying to create a materialized view of a large BQ table. The table receives a high volume of streaming web activity inserts, is multi-tenant, and really leverages BQ's nested columnar structure.
We want to create a subset of this table for more efficient, near-real time query execution with minimal administrative overhead. We thought the simplest solution would be to create a materialized view which is just a subset of rows (by client) and columns, but currently materialized views require aggregation.
Additionally, the materialized view beta supports a limited set of aggregation functions and does not support sub-selects or UNNEST operations. We have not found a good method of extracting the deeply nested STRUCTs into the materialized view. A simple example:
SELECT
'7602E3E96349E972' as session_id,
'084F0262' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SAVE50'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'7602E3E96349E972' as session_id,
'01ECB6EF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SPRING','LOVE'] as value),
STRUCT(
'discounts' as name,
['14.99','6.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'508082BC49BAC09F' as session_id,
'038B67CF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['FREESHIP','HOLIDAY25'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as transaction
)] as contexts_transaction
UNION ALL
SELECT
'C88AE153C784D910' as session_id,
'EA716BD2' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['CYBER'] as value),
STRUCT(
'discounts' as name,
['9.99','19.99'] as value)
] as modifiers
)]
In that ideally we would retain this STRUCT as is, we are trying to accomplish something like this in the materialized view (recognizing these are not supported MV features):
SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2
We are open to any method of subsetting this massive table, not just MV, but would love it to have the same benefits (low maintenance, automatic, low cost). Any suggestions appreciated!
Upvotes: 1
Views: 1697
Reputation: 2365
As far as I could understand from your question, you want to have a similar output to this:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
So, input table looks like this
While, output table looks like
If your intention is different, please state it in the question with more details.
For this purpose, I tried to create that query as a materialized view.
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
However, I get the error Unsupported aggregation function in materialized view: array_concat_agg.
.
Since materialized views are beta yet, we don't know if it's going to be supported in the future. However, it's not possible to do it with current capabilities.
@fhoffa can tell more about it, maybe.
Upvotes: 0