Dino C
Dino C

Reputation: 317

Compare geometries with geopandas

Let us say, there are two df's, df1 and df2. Both have one column (geometry_1 and geometry_2 respectively) representing geometry of linestring type.

df1    
    geometry_1
0   LINESTRING(37.00 59.00, 37.05 59.32)
.... 


df2
    geometry_2
0   LINESTRING(37.89 59.55, 38.05 60.32 )
....

Both df's have more rows, but for now I want to focus on the following question. Is there any way to evaluate if the two lines are similar. By similar I mean that if the distance between the respective points of the lines is no higher than a valid value (eg. 100m), the two lines are considered identical.

Upvotes: 3

Views: 2585

Answers (1)

Marjan Moderc
Marjan Moderc

Reputation: 2859

The test that you are after (i.e. comparing vertex per vertex) has a very important constraint: there must be exactly the same number of vertices in both LineStrings, which is not very likely to happen.

Since you obviously want a very basic, broad similarity check, I would start with comparing the main characteristics of your lines. You can achieve that by using shapely's geometry attributes like in the following self-explainatory example:

def are_geometries_similar(geom1,geom2,MAX_ALLOWED_DISTANCE = 100,MAX_ALLOWED_DIFFERENCE_RATIO = 0.1):

    """
    Function compares two linestrings' number of vertices, length and basic position.
    If they pass all 3 tests within the specified margin of error, it returns true, otherwise it returns false.
    """    

    # 1. Compare length:
    l1 = geom1.length
    l2 = geom2.length

    if not abs(float(l1) - l2)/max([l1,l2]) < MAX_ALLOWED_DIFFERENCE_RATIO:
        return False

    # 2. Compare number of vertices:
    vert_num1 = len(geom1.coords)
    vert_num2 = len(geom2.coords)

    if not abs(float(vert_num1) - vert_num2)/max([vert_num1,vert_num2]) < MAX_ALLOWED_DIFFERENCE_RATIO:
        return False

    # 3. Compare position by calculating the representative point
    rp1 = geom1.representative_point()
    rp2 = geom2.representative_point()

    if rp1.distance(rp2) > MAX_ALLOWED_DISTANCE:
        return False

    # If all tests passed, return True
    return True

Upvotes: 2

Related Questions