Bas Hamer
Bas Hamer

Reputation: 324

what is the best way to reduce complexity of geometries

so I'm playing around with the http://www.gadm.org/ dataset;

I want to go from lat & lon to a country and state (or equivalent).

So to simplify the data I'm grouping it up and unioning the geometies; so far so good. the results are great, I can pull back Belgium and it is fine.

I pull back australia and I get victoria because the thing is too damn large.

Now I honestly don't care too much about the level of detail; if lines are w/in 1 km of where they should be I'm ok (as long as shapes are bigger, not smaller)

What is the best approach to reduce the complexity of my geospatial objects so I end up with a quick and simple view of the world?

All data is kept as Geometry data.

Upvotes: 2

Views: 4339

Answers (3)

DS Labs
DS Labs

Reputation: 11

A more sophisticated approach to reducing complex geometry (if your using SQL Server 2017 and above) is to use Python with SQL Server Machine Learning Services.

Prerequisites:

You can then use sp_execute_external_script to access python shape processing libraries like shapely through a stored procedure:

IF OBJECT_ID(N'.dbo.usp_Python_Remove_Non_Essential_Vectors', N'P') IS NOT NULL
DROP PROCEDURE dbo.usp_Python_Remove_Non_Essential_Vectors;
GO

CREATE PROCEDURE dbo.usp_Python_Remove_Non_Essential_Vectors
(
    @Geom_Input GEOMETRY, 
    @Tolerance FLOAT,
    @Geom_Output GEOMETRY OUTPUT
)
AS
BEGIN
    IF OBJECT_ID('tempdb..#GeomData_In') IS NOT NULL DROP TABLE #GeomData_In;
    CREATE TABLE #GeomData_In (Shape VARBINARY(MAX));

    INSERT INTO #GeomData_In (Shape)
    SELECT @Geom_Input.STAsBinary();

    IF OBJECT_ID('tempdb..#GeomData_Out') IS NOT NULL DROP TABLE #GeomData_Out;
    CREATE TABLE #GeomData_Out (Shape VARBINARY(MAX));

-- Python is sensitive to tab characters - this must be against the left edge
INSERT INTO #GeomData_Out
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import shapely.wkb
import pandas as pd

# Deserialize the geometry from WKB to a Shapely object
geom = shapely.wkb.loads(InputDataSet["Shape"][0])

# Simplify the geometry with a very small tolerance to remove non-essential points
simplified_geom = geom.simplify(tolerance=Tolerance, preserve_topology=True)

# Serialize the simplified geometry back to WKB for SQL Server
simplified_wkb = shapely.wkb.dumps(simplified_geom)

# Create a pandas DataFrame to hold the simplified geometry''s WKB
OutputDataSet = pd.DataFrame({"SimplifiedGeometry": [simplified_wkb]})
',
@input_data_1 = N'SELECT Shape FROM #GeomData_In',
@params = N'@Tolerance FLOAT',
@Tolerance = @Tolerance;

    SET @Geom_Output = ( SELECT GEOMETRY::STGeomFromWKB(Shape,4326).MakeValid() FROM #GeomData_Out );
END;
GO

Python's shapely package takes a floating point tolerance. You can experiment with different values until it works for your specific set of shapes.

Here is a manually created shape and example of how to pull a shape from a table. You can process these in a while loop or adjust the python logic to use a whole table as the input.

DECLARE @Shape_Simplified GEOMETRY;
DECLARE @Test_Geom GEOMETRY;
    
-- Create a shape manually for testing
SET @Test_Geom = GEOMETRY::STPolyFromText('POLYGON ((' +
    '-116.45468031099995 41.111344923000047,' + 
    '-116.45469829799998 41.114942212000074,' + 
    '-116.45471628499996 41.118611446000045,' + 
    '-116.45948269299998 41.118602452000061,' + 
    '-116.46424910099995 41.118602452000061,' + 
    '-116.46422211999999 41.116531828000052,' +  -- Extra point on a straight line
    '-116.46422211999999 41.114951204000079,' + 
    '-116.46418614699996 41.113371580000056,' +  -- Extra point on a straight line
    '-116.46418614699996 41.111362909000036,' + 
    '-116.45943772599998 41.111353916000041,' + 
    '-116.45468031099995 41.111344923000047))',4326);

---- Or load a shape from the database
--SET @Test_Geom = ( SELECT TOP 1 Shape FROM YourTable );

DECLARE @Tolerance FLOAT = 0.00005; -- Define the tolerance here
EXECUTE dbo.usp_Python_Remove_Non_Essential_Vectors 
    @Test_Geom, 
    @Tolerance, 
    @Shape_Simplified OUTPUT;

Once you have run the procedure you can visualize the results by combining the original and simplified shapes along with buffered points to see what was removed:

-- Create a temporary table to store the original points
IF OBJECT_ID('tempdb.dbo.#Original_Points') IS NOT NULL
DROP TABLE #Original_Points;

CREATE TABLE #Original_Points 
(
    ID INTEGER IDENTITY(1,1) NOT NULL,
    X FLOAT NOT NULL,
    Y FLOAT NOT NULL,
    Point GEOMETRY NOT NULL
);

-- Insert the points into the temporary table using a join on a numbers table
INSERT  INTO #Original_Points
SELECT  @Test_Geom.STPointN(n.Number).STX AS X, 
        @Test_Geom.STPointN(n.Number).STY AS Y,
        GEOMETRY::Point(
            @Test_Geom.STPointN(n.Number).STX, 
            @Test_Geom.STPointN(n.Number).STY, 
            4326).STBuffer(0.0001) AS Point
FROM    Numbers AS n
WHERE   n.Number <= @Test_Geom.STNumPoints();

-- Create a temporary table to store the simplified points
IF OBJECT_ID('tempdb.dbo.#Simplified_Points') IS NOT NULL
DROP TABLE #Simplified_Points;

CREATE TABLE #Simplified_Points 
(
    ID INTEGER IDENTITY(1,1) NOT NULL,
    X FLOAT NOT NULL,
    Y FLOAT NOT NULL,
    Point GEOMETRY NOT NULL
);

-- Insert the points into the temporary table using a join on a numbers table
INSERT  INTO #Simplified_Points
SELECT  @Shape_Simplified.STPointN(n.Number).STX AS X, 
        @Shape_Simplified.STPointN(n.Number).STY AS Y,
        GEOMETRY::Point(
            @Shape_Simplified.STPointN(n.Number).STX, 
            @Shape_Simplified.STPointN(n.Number).STY, 
            4326).STBuffer(0.0001) AS Point
FROM    Numbers AS n
WHERE   n.Number <= @Shape_Simplified.STNumPoints();

Now you can compare the results to see how your shape was affected:

-- Show numeric difference between the amount of points before and after processing
SELECT  FORMAT(COUNT(*),'N0') AS Original_Points,
        FORMAT(( SELECT COUNT(*) FROM #Simplified_Points ),'N0') AS Simplified_Points
FROM    #Original_Points;

-- Original Points
SELECT  'Original Shape' AS Shape_Type,
        @Test_Geom AS Shape
UNION ALL
SELECT  'Original Vector' AS Shape_Type,
        Point
FROM    #Original_Points;

-- Simplified Points
SELECT  'Simplified Shape' AS Shape_Type,
        @Shape_Simplified AS Shape
UNION ALL
SELECT  'Simplified Vector' AS Shape_Type,
        Point
FROM    #Simplified_Points;

The final output shows how many points were removed - in this case using the sample shape:

  • Original_Points = 11
  • Simplified_Points = 5

The results can be visualized on the Spatial Results tab:

Spatial Results Tab

Adjust the @Tolerance variable until you find the right balance of points to remove and you've got your less-complex shapes.

Upvotes: 0

psousa
psousa

Reputation: 6726

As you've tagged the question with "tsql" I'm assuming you're using Sql Server. Thus, you already have an handy function called Reduce which you can apply on the geometry data type to simplify it.

For example (copied from the above link):

DECLARE @g geometry;
SET @g = geometry::STGeomFromText('LINESTRING(0 0, 0 1, 1 0, 2 1, 3 0, 4 1)', 0);
SELECT @g.Reduce(.75).ToString();

The function receives a tolerance argument with the simplification threshold.

Upvotes: 7

Lyth
Lyth

Reputation: 2211

I suppose complexity is determined only by the number of vertices in a shape. There are quite a number of shape simplifying algorithms to choose from (and maybe some source too).

As a simplistic approach, you can iterate over vertices and reject concave ones if the result does not intoduce an error too large (e.g. in terms of added area), preferably adjoining smaller segments into larger. A more sophisticated approach might break an existing segment to better remove smaller ones.

Upvotes: 1

Related Questions