Reputation: 23
the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?
BR, Hannes
def polyData(uniqueIds):
for index in range(len(uniqueIds) - 1):
element = uniqueIds[index]
polyData1 = df[df['id'] == element]
poly1 = build_poly(polyData1)
poly1 = poly1.buffer(0)
for secondIndex in range(index + 1, len(uniqueIds)):
otherElement = uniqueIds[secondIndex]
polyData2 = df[df['id'] == otherElement]
poly2 = build_poly(polyData2)
poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)
Upvotes: 2
Views: 78
Reputation:
This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.
def getPolyForUniqueId(uid):
polyData = df[df['id'] == uid]
poly = build_poly(polyData)
poly = poly.buffer(0)
return polyData
def polyData(uniqueIds):
polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
for index in range(len(uniqueIds) - 1):
id_1 = uniqueIds[index]
poly_1 = polyDataList[index]
for secondIndex in range(index + 1, len(uniqueIds)):
id_2 = uniqueIds[secondIndex]
poly_2 = polyDataList[secondIndex]
...
Upvotes: 1