Terry Carmen
Terry Carmen

Reputation: 3886

Incorrect data from MariaDB POLYGON SELECT

Server: MariaDB 10.4.17

INSERTing a POLYGON with 14 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, which is more data than actually exists, and the excess precision is incorrect.

INSERTing a 0-padded POLYGON with 15 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, however the SELECTed data is incorrect in the last digit and is not the 0 used for right-padding.

Because the table data is incorrect, the various Geometry functions like ST_Contains() produce incorrect results. This appears to be some sort of floating point type of error, but I'm not sure how to work around it.

Is there any way to make MariaDB save, use and return the same data is was given?

Example:


INSERT INTO `Area` 
        (`Name`, `Coords`) 
VALUES ('Test ', GeomFromText('POLYGON((
                    -76.123527198020080 43.010597920077250,
                    -76.128263410842290 43.016193091211520,
                    -76.130763247573610 43.033194256815040,
                    -76.140676208063910 43.033514863935440,
                    -76.13626333248750 43.008550330099250,
                    -76.123527198020080 43.010597920077250))'));

SELECT Coords FROM `Area` WHERE `Name` = 'Test';

POLYGON ((
                     -76.123527198020085 43.010597920077252,
                     -76.128263410842294 43.01619309121152,
                     -76.130763247573611 43.033194256815037,
                     -76.140676208063908 43.033514863935437,
                     -76.136263332487502 43.008550330099247,
                     -76.123527198020085 43.010597920077252
         ))

Edit:

As per @Michael-Entin the floating point error was a dead end and could not be responsible for the size of the errors I was getting.

Update:

The problem was "me". I had accidentally used MBRContains() in one of the queries instead of ST_Contains().

MBRContains uses the "Minimum Bounding Rectangle" that will contain the polygon, not the actual POLYGON coordinates.

Using MBRContains had caused the area to be significantly larger than expected, and appeared to be a processing error, which it was not.

ST_Contains() is slower but respects all the POLYGON edges and yields correct results.

Thanks to @Michael-Entin for noticing that the floating point error couldn't account for the magnitude of the error I was experiencing. This information pointed me in the right direction.

Upvotes: 1

Views: 261

Answers (2)

aka.nice
aka.nice

Reputation: 9372

Most floating point hardware will be in base 2.
If we try and decompose the absolute value of -76.128263410842290 in base 2 it's:

64 (2^6) + 8 (2^3) + 4 (2^2) + 0.125 (2^-3) + ...

Somehow we can note this number in base two with a sequence of bits 1001100.001...
Bad luck, in base 2, this number would require an infinite sequence of such bits.
The sequence begins with:

1001100.001000001101010111011110111100101101011101001110111000...

But floats have limited precision, the significand only has 53 bits in IEEE double precision, including the bits BEFORE the fraction separator.
That means that the least significant bit (the unit of least precision) represents 2^-46...

1001100.001000001101010111011110111100101101011101001110111000...
1001100.00100000110101011101111011110010110101110101

Notice that the floating point value has been rounded up (to the nearest float).

Let's multiply 2^-46 by appropriate power of five 5^46/5^46: it is 5^46/10^46.
It means that its DECIMAL representation ends exactly 46 places after the DECIMAL point, or a bit less if the trailing bits of float significand are zero (not the case here, trailing bit is 1).

So potentially, the fraction part of those floating point numbers has about 46 digits, not even 14 nor 15 as you seem to assume.

If we turn this floating point value back to decimal, we indeed get:

-76.12826341084229397893068380653858184814453125
-76.128263410842290

See it's indeed slightly greater than your initial input here, because the float was rounded to upper.

If you ask to print 15 decimal places AFTER the fraction separator, you get a rounded result.

-76.128263410842294

In this float number, the last bit 2^-46 has the decimal value

0.0000000000000142108547152020037174224853515625

where 142108547152020037174224853515625 is 5^46, you can do the math.

The immediate floating point values will differ in this last bit (we can add or subtract it)

1001100.00100000110101011101111011110010110101110100
1001100.00100000110101011101111011110010110101110101
1001100.00100000110101011101111011110010110101110110

It means that the immediate floating point neighbours are about +/- 1.42 10^-14 further...
This means that you cannot trust the 14th digits after the fraction, double precision does not have such resolution!
Not a surprise that the nearest float falls up to 7 10^-15 off your specified input sometimes (half the resolution, thanks to round to nearest rule).
Remember, float precision is RELATIVE, if we consume bits left of fraction separator, we reduce the precision of the fraction part (the point is floating literally).

This is very basic knowledge scientists should acquire before using floating point.
I hope those examples help as a very restricted introduction.

Upvotes: 0

Michael Entin
Michael Entin

Reputation: 7724

I think the precision you have is reaching the limit of the 64-bit floating point, and what you get is really the nearest floating point value representable by CPU.

The code below prints the input value without any modification, and then the very next double floating point values decremented and incremented by smallest possible amounts:

int main() {
    const double f = -76.123527198020080;
    cout << setprecision(17) << f << endl
        << nextafter(f, -INFINITY) << endl
        << nextafter(f, INFINITY) << endl;
}

The results I get

-76.123527198020085
-76.123527198020099
-76.123527198020071

As you see, -76.123527198020085 is the nearest value to your coordinate -76.123527198020080, and its closest possible neighbors are -76.123527198020099 (even further), and -76.123527198020071 (also slightly further, but to a different direction).

So I don't think there is any way to keep the precision you want. Nor there should be a practical reason to keep such precision (the difference is less than a micron, i.e. 1e-6 of a meter).

What you should be looking at is how exactly ST_Contains does not meet your expectations. The geometric libraries usually do snapping with tolerance distance that is slightly higher than the numeric precision of coordinates, and this should ideally make sure such minor differences in input values don't affect the outcome of such function.

Upvotes: 3

Related Questions