Rebel
Rebel

Reputation: 515

Speed up nested if loops under a for loop

On a 2d plane, there is a large circle centered at (0,0) with a radius of 𝑅𝑜. It encloses ∼100 or so smaller circles distributed randomly across the parent circle otherwise with known radii and positions with respect to the origin. (It is possible that some smaller sub-circles are partially or entirely inside some larger sub-circles.)

The entire plane is gridded uniformly into pixels with sides being horizontal and vertical (along coordinate axes). The size of the pixels is fixed and known a priori but otherwise much smaller than the size of parent circle; there are on the order of ~1000 special pixels all over the parent circle. We are given the 2D Cartesian coordinates for (the centers of) all these special grids. Those sub-circles which enclose at least one of these special grids are named *special" sub-circles for later use.

Now, imagine all this 3d space is filled with ~100,000,000 particles. My code tries to add up these particles within each one of the special sub-circles.

I managed to debug my code but it seems that when I am dealing with such a great number of particles, it is very slow as written below. I would like to see if I can use any trick to speed it up at least by an order of magnitude.

.
.
.
for x, y in zip(vals1, vals2):  # vals1, vals2 are the 2d position array of the *special* grids each with a 1d array of size ~1000
    enclosing_circles, sub_circle_catalog, some_parameter_catalog, totals = {}, [], [], {}


    for id, mass in zip(ids_data, masss_data): # These two arrays are equal in size equal to an array of size ~100,000,000
        rule1 = some_condition           # this check if each special grid is within each circle
        rule2 = some_other_condition     # this makes sure that we are only concerned with those circles larger than some threshold size 

        if (rule1 and rule2):
            calculated_property = some_function

            if condition_3:
                calculated_some_other_property = some_other_function

                if condition_4:
                    some_quantity = something
                    enclosing_circles[id] = float('{:.4f}'.format(log10(mass)))
                    some_parameter[id] = float('{:.3e}'.format(some_quantity))


    # choose all sub-circles' IDs enclosing the special pixel
    enclosing_circles_list = list(enclosing_circles.keys())
    some_parameter_list = list(some_parameter.keys())
    sub_circle_catalog += [(enclosing_circles[i], 1) for i in enclosing_circles_list]
    some_parameter_catalog += [(enclosing_circles[i], some_parameter[j]) for i, j in zip(enclosing_circles_list, some_parameter_list)]

# add up all special grids in each sub-circle when looping over all grids
for key, value in sub_circle_catalog:
    totals[key] = totals.get(key, 0) + value
totals_dict = collections.OrderedDict(sorted(totals.items()))
totals_list = list(totals.items())


with open(some_file_path, "a") as some_file:
    print('{}'.format(totals_list), file=some_file)
    some_file.close()
.
.
.

Upvotes: 0

Views: 308

Answers (1)

gilch
gilch

Reputation: 11651

rule1 and rule2 under the second for are taking the longest.

Inline rule1 and rule2. The and won't evaluate the second part if it knows the first one is false. Also try swapping them to see if that's better.

Depending on the details of how these rules are calculated, you may find other opportunities for shortcuts like this.


Always profile to find the bottlenecks. You can waste a lot of time optimizing other parts that won't help much.

Shortcut when possible; don't waste time calculating things you don't need.

Avoid function calls in nested loops by inlining them. Calls are kinda slow in CPython.

Unroll inner loops to reduce looping overhead.

Calculate things outside of the loop whenever possible instead of redoing it each loop.


Consider compiling the whole thing with Nutika, Cython, or PyPy. (Or just the slow parts with Cython or Numba.)

Consider rewriting this part in Julia, which is faster and easy to call from Python. It's better to extract and call the whole inner loop, instead of just its body, to avoid the call overhead each loop.

Consider vectorizing calculations with numpy where possible, even if it's only part of the loop. Numpy's internal loops are much faster than Python's. This can take more memory. If you can make numpy vectorization work, you might be able to get even greater speedups with CuPy, which uses the GPU, or Dask which can handle larger data sets.

Upvotes: 5

Related Questions