Reputation: 317
Let us say, there are two df's, df1
and df2
. Both have one column (geometry_1
and geometry_2
respectively) representing geometry of linestring type.
df1
geometry_1
0 LINESTRING(37.00 59.00, 37.05 59.32)
....
df2
geometry_2
0 LINESTRING(37.89 59.55, 38.05 60.32 )
....
Both df's have more rows, but for now I want to focus on the following question. Is there any way to evaluate if the two lines are similar. By similar I mean that if the distance between the respective points of the lines is no higher than a valid value (eg. 100m), the two lines are considered identical.
Upvotes: 3
Views: 2585
Reputation: 2859
The test that you are after (i.e. comparing vertex per vertex) has a very important constraint: there must be exactly the same number of vertices in both LineStrings, which is not very likely to happen.
Since you obviously want a very basic, broad similarity check, I would start with comparing the main characteristics of your lines. You can achieve that by using shapely
's geometry attributes like in the following self-explainatory example:
def are_geometries_similar(geom1,geom2,MAX_ALLOWED_DISTANCE = 100,MAX_ALLOWED_DIFFERENCE_RATIO = 0.1):
"""
Function compares two linestrings' number of vertices, length and basic position.
If they pass all 3 tests within the specified margin of error, it returns true, otherwise it returns false.
"""
# 1. Compare length:
l1 = geom1.length
l2 = geom2.length
if not abs(float(l1) - l2)/max([l1,l2]) < MAX_ALLOWED_DIFFERENCE_RATIO:
return False
# 2. Compare number of vertices:
vert_num1 = len(geom1.coords)
vert_num2 = len(geom2.coords)
if not abs(float(vert_num1) - vert_num2)/max([vert_num1,vert_num2]) < MAX_ALLOWED_DIFFERENCE_RATIO:
return False
# 3. Compare position by calculating the representative point
rp1 = geom1.representative_point()
rp2 = geom2.representative_point()
if rp1.distance(rp2) > MAX_ALLOWED_DISTANCE:
return False
# If all tests passed, return True
return True
Upvotes: 2