Jonsey
Jonsey

Reputation: 1

Handling failed geography validation from crossing edges in Snowflake

I am attempting to load WKT representations of polygons and multipolygons into a Snowflake GEOGRAPHY-type column. Each polygon has thousands of points, each of which is defined by lat/long points defined with upwards of 10 points of precision after the decimal.

As I attempt to cast my WKT strings to the proper format for insertion, ie TO_GEOGRAPHY(), I get variations of this error:

100217 (P0000): Geography validation failed: Edge (-77.495072,35.477751) -- (-77.495247,35.477899) crosses edge (-77.495247,35.477899) -- (-77.495247,35.477899)

This particular error was thrown when I tried to process the following WKT: (its very long, so hosted here; my apologies)

For convenience, here's the section of that string with the crossing edges:

...
-77.49480800034235 35.47764599906043, -77.49480800036956 35.477645999070795,
-77.49507200036956 35.47775099907079, -77.49507200051436 35.47775099914243,
-77.49507200064573 35.47775099923645, -77.49524700064573 35.47789899923645,
-77.49524700064573 35.477898999236444, -77.49524700072536 35.47789899931164,
-77.49552600072536 35.47819299931164,-77.4955260008524 35.4781929994771, 
...

Obviously I can just ignore the invalid edges by setting allow_invalid = TRUE in the TO_GEOGRAPHY() fn, forcing the load, but that doesn't meet my needs - I need some way to make these polygons valid, ideally without truncating every single coordinate value to make it less precise.

I did find this thread somewhat useful, but the recommended solution doesn't seem to apply to all my polygons. Applied to the example string above, a new edge is revealed as a problem:

100217 (P0000): Geography validation failed: Edge (-77.173830,35.470360) -- (-77.173830,35.469260) crosses edge (-77.173830,35.469266) -- (-77.173831,35.468991)

Does anyone have any advice for next steps - solutions or even debugging steps? Like I said, this error occurs on many of my thousands of polygons, and I'm stumped.

Happy to provide more details wherever needed, or additional examples.

Upvotes: 0

Views: 79

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 26078

So I grab the huge block of points, put them into VS Code, regex'ed it a valid SQL value list:

  1. , -> ),\n(

    the \n should be a shift+enter to put a new line token in. Then

  2. -> ,

Now you can slap that into a insert statement like:

create table lat_lons as
select 
    $1::number(30,16) as lat,
    $2::number(30,16) as lon,
    row_number() over (order by true) as rn,
from values
(-77.6865440007049,35.6471289992901),
(-77.68671800056762,35.6472919991615),
(-77.68744500042364,35.647631999094166),
(-77.68744500042364,35.64763199909417),

albeit 7K lines long.

Then you can truncate those values to 5dp, and only keep the first of each set.

    select * from 
    (
        select *,
            floor(lat,5) as lat_s,
            floor(lon,5) as lon_s,
            conditional_change_event(array_construct(lat_s,lon_s)) over (order by rn) as change_group
        from lat_lons
    )
    qualify row_number() over (partition by change_group order by rn) = 1

then turn those back into a text list:

    select listagg(lat_s||' '||lon_s, ',') within group (order by rn) as l,
        count(*) as c
    from d1

then turn that into a GEOGRAPTHY:

with d1 as (
    select * from 
    (
        select *,
            floor(lat,5) as lat_s,
            floor(lon,5) as lon_s,
            conditional_change_event(array_construct(lat_s,lon_s)) over (order by rn) as change_group
        from lat_lons
    )
    qualify row_number() over (partition by change_group order by rn) = 1
), to_list as (
    select listagg(lat_s||' '||lon_s, ',') within group (order by rn) as l,
        count(*) as c
    from d1
)
select c, 
    length(l),
    to_geography('MULTIPOLYGON((('|| l ||')))')
from to_list;

enter image description here

for 7 DP you get 7700 points instead of 7273. I found the value up to 14 worked.

Upvotes: 0

Related Questions