How do I speed up this nested for loop in Python?

Question

the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?

BR, Hannes

def polyData(uniqueIds):
    for index in range(len(uniqueIds) - 1):
        element = uniqueIds[index]
        polyData1 = df[df['id'] == element]
        poly1 = build_poly(polyData1)
        poly1 = poly1.buffer(0)
        for secondIndex in range(index + 1, len(uniqueIds)):
            otherElement = uniqueIds[secondIndex]
            polyData2 = df[df['id'] == otherElement]
            poly2 = build_poly(polyData2)
            poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
            overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
            df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
            df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)

user7661619 · Accepted Answer

This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.

def getPolyForUniqueId(uid):
    polyData = df[df['id'] == uid]
    poly = build_poly(polyData)
    poly = poly.buffer(0)
    return polyData

def polyData(uniqueIds):
    polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
    for index in range(len(uniqueIds) - 1):
        id_1 = uniqueIds[index]
        poly_1 = polyDataList[index]
        for secondIndex in range(index + 1, len(uniqueIds)):
            id_2 = uniqueIds[secondIndex]
            poly_2 = polyDataList[secondIndex]
            ...

How do I speed up this nested for loop in Python?

Answers (1)

Related Questions