Edgars T.
Edgars T.

Reputation: 1149

Redshift numeric precision truncating

I have encountered situation that I can't explain how Redshift handles division of SUMs.

There is example table:

create table public.datatype_test(
a numeric(19,6),
b numeric(19,6));
insert into public.datatype_test values(222222.2222, 333333.3333);
insert into public.datatype_test values(444444.4444, 666666.6666);

Now I try to run query:

select sum(a)/sum(b) from public.datatype_test;

I get result 0.6666 (4 decimals). It is not related to tool display, it really returns only 4 decimal places, and it doesn't matter how big or small numbers are in table. In my case 4 decimals is not precise enough. Same stands true if I use AVG instead of SUM.

If I use MAX instead of SUM, I get : 0.6666666666666666666 (19 decimals).

It also returns correct result (0.6666666666666667) when no phisical table is used:

with t as (
select 222222.2222::numeric(19,6) as a, 333333.3333::numeric(19,6) as b union all 
select 444444.4444::numeric(19,6) as a, 666666.6666::numeric(19,6) as b
)
select sum(a)/sum(b) as d from t; 

I have looked into Redshift documentation about SUM and Computations with Numeric Values, but I still don't get result according to documentation.

Using float datatype for table columns is not an option as I need to store precise currency amounts and 15 significant digits is not enough.

Using cast on SUM aggregation also gives 0.6666666666666666666 (19 decimals).

select sum(a)::numeric(19,6)/sum(b) from public.datatype_test;

But it looks wrong, and I can't force BI tools to do this workaround, also everyone who uses this data should not use this kind of workaround.

I have tried to use same test in PostgreSQL 10, and it works as it should, returning sufficient amount of decimals for division.

Is there anything I can do with database setup to avoid casting in SQL Query? Any advice or guidance is highly appreciated.

Redshift version: PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.4081 Using dc2.8xlarge nodes

Upvotes: 1

Views: 10833

Answers (1)

wrschneider
wrschneider

Reputation: 18780

I have run into similar issues, and although I don't have a solution that doesn't require a workaround, I can at least explain it.

The precision/scale of the result of division is defined by the rules in the "computations with numeric values" document.

A consequence of those rules is that a decimal(19,6) divided by another decimal(19,6) will return decimal(38,19).

What's happening to you, though, is that MAX returns the same precision/scale as the underlying column, but SUM returns decimal(38,*) no matter what. (This is probably a safety precaution to prevent overflow on sums of "big data"). If you divide decimal(38,6) by another, you get decimal(38,4).

AWS support will probably not consider this a defect -- there is no SQL standard for how to treat decimal precision in division, and given that this is documented behavior, it's probably a deliberate decision.

The only way to address this is to typecast the numerator, or multiply it by something like sum(a) * cast(1 as decimal(10,9)) which is portable SQL and will force more decimal places in the numerator and thus the result.

As a convenience I made a calculator in JSFiddle with the rules so you can play around with different options:

scale = Math.max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale

if (precision > 38) {
    scale = Math.max((38 + scale - precision), 4)
    precision = 38
}

Upvotes: 4

Related Questions