Reputation: 1
I'm trying to calculate median on different grouping sets in Vertica and I get the following error:
[0A000][6467] [Vertica]VJDBC ERROR: User defined aggregate cannot be used in query with MLAs"
this is a template of my query which doesn't work:
select field_a, field_b, APPROXIMATE_MEDIAN(field_c)
from my_table
group by cube (field_a, field_b)
where in this one, when I group by just one field, it works just fine:
select field_a, APPROXIMATE_MEDIAN(field_c)
from my_table
group by field_a
Would love to get any idea for why this is happening and how can I rewrite my query so it'll work. My next step is to union the calculations per every grouping, but I hope to find a more elegant solution.
Thanks!
tried to search for this error, tried to run on a smaller case, tried to think how can I get the same result using the analytic version of median, but couldn't find anything that supplies my need and can't find the logic behind why it does not work.
Upvotes: 0
Views: 149
Reputation: 6749
You must remember that GROUP BY CUBE
or GROUP BY ROLLUP
often are multi-pass query access plans, even if you don't code them as such.
And it does not hurt, as Vertica's query engine has pipeline parallelism - many parts of the plan are executed by different threads in parallel. So while it might look awkward to see it in code, it's not so un-elegant after all if you UNION SELECT
all levels of aggregation together:
WITH
indata(fld_a,fld_b,fld_c) AS (
SELECT 1,1, 1
UNION ALL SELECT 1,2, 2
UNION ALL SELECT 1,3, 3
UNION ALL SELECT 1,4, 4
UNION ALL SELECT 1,5, 5
UNION ALL SELECT 2,1, 10
UNION ALL SELECT 2,2, 20
UNION ALL SELECT 2,3, 30
UNION ALL SELECT 3,4, 40
UNION ALL SELECT 3,5, 50
UNION ALL SELECT 3,1,100
UNION ALL SELECT 3,2,200
UNION ALL SELECT 3,3,300
UNION ALL SELECT 3,4,400
UNION ALL SELECT 3,5,500
)
SELECT
'fld_a' AS grain
, fld_a
, NULL::int AS fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_a
UNION ALL
SELECT
'fld_b' AS grain
, NULL::int AS fld_a
, fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_b
UNION ALL
SELECT
'both' AS grain
, fld_a
, fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_a,fld_b
ORDER BY 1,2,3
;
Returning:
grain | fld_a | fld_b | app_med_c |
---|---|---|---|
both | 1 | 1 | 1 |
both | 1 | 2 | 2 |
both | 1 | 3 | 3 |
both | 1 | 4 | 4 |
both | 1 | 5 | 5 |
both | 2 | 1 | 10 |
both | 2 | 2 | 20 |
both | 2 | 3 | 30 |
both | 3 | 1 | 100 |
both | 3 | 2 | 200 |
both | 3 | 3 | 300 |
both | 3 | 4 | 220 |
both | 3 | 5 | 275 |
fld_a | 1 | (null) | 3 |
fld_a | 2 | (null) | 20 |
fld_a | 3 | (null) | 200 |
fld_b | (null) | 1 | 10 |
fld_b | (null) | 2 | 20 |
fld_b | (null) | 3 | 30 |
fld_b | (null) | 4 | 40 |
fld_b | (null) | 5 | 50 |
Upvotes: 0