Reputation: 5824
In Redshift I can create a histogram – in this case it's binning a column named metric into 100ms buckets
select floor(metric / 100) * 100 as bin, count(*) as impressions
from tablename
where epoch > date_part(epoch, dateadd(day, -1, sysdate))
and metric is not null
group by bin
order by bin
There's a danger that some of the bins might be empty and won't appear in the result set, so I want to use generate_series
to create the empty bins e.g.
select *, 0 as impressions from generate_series(0, maxMetricValue, 100) as bin
and union the two sets of results together to produce the 'full' histogram
select bin, sum(impressions)
from
(
select floor(metric/100)*100 as bin, count(*) as impressions
from tablename
where epoch > date_part(epoch, dateadd(day, -1, sysdate))
and metric is not null
group by bin
order by bin
)
union
(
select *, 0 as impressions from generate_series(0, maxMetricValue, 100) as bin
)
oroup by bin
order by bin
The challenge is that calculating the maxMetricValue requires a subquery i.e. select max(metric)… etc
and I'd like to avoid that
Is there a way I can calculate the max value from the histogram query and use that instead?
Edit:
Something like this seems along the right lines but Redshift doesn't like it
with histogram as (
select cast(floor(metric/100)*100 as integer) as bin, count(*) as impressions
from table name
and epoch > date_part(epoch, dateadd(day, -1, sysdate))
and metric is not null
group by bin
order by bin)
select bin, sum(impressions)
from (
select * from histogram
union
select *, 0 as impressions from generate_series(0, (select max(bin) from histogram), 100) as bin
)
group by bin
order by bin
I get this error, but there are no INFO messages visible: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
If I remove the cast I get: ERROR: function generate_series(integer, double precision, integer) does not exist Hint: No function matches the given name and argument types. You may need to add explicit type casts.
If I try using cast
or convert
in the parameter for generate_series
I get the first error again!
Edit 2:
Presume the above query is failing because Redshift is trying to execute generate_series
on a compute node rather than a leader but not sure
Upvotes: 2
Views: 545
Reputation: 11042
First off generate_series is a leader-only function and will throw an error when used in combo with user data. A recursive CTE is the way to do this but since this isn't what you want I won't get into it.
You could create a numbers table and calculate the min, max and count from the other data you know. You could then outer join on some condition that will never match.
However i expect you will be much better off the union all you already have.
Upvotes: 0