David542
David542

Reputation: 110093

What takes so long in the following python for loop

I am doing a search on 1M rows of data against ten terms, so I have to do a billion iterations. I will probably need to move outside of python to have anything useful, but what accounts for the "slowness" of the following double-for loop in python?

for idx, row in enumerate(self.data): # length of 1M

    has_row_match = False
    fields_with_partial_matches = {}
    field_to_cast_value = {}
    matched_terms = set()
    skip_columns = set()   # If that column is already matched to a term
                           # And not a multi-word column, don't allow it to be searched for another term
    if idx % 1000 == 0: print idx

    for term_obj in term_objs: # length of 10

        term = term_obj['searchAs']
        search_type = term_obj['searchType']
        data_type = term_obj['dataType']
        field = term_obj['Field']
        has_term_match = False

Ran search in 7.2528s

Basically I'm just initializing empty objects and then doing a dict lookup. What accounts for the tremendous runtime then in this (when I haven't even yet started "doing anything")?

Upvotes: 0

Views: 82

Answers (1)

Corley Brigman
Corley Brigman

Reputation: 12371

I can't definitely speak for your system, but by going through pieces, it looks like it's the dictionary lookups:

In [8]: test_dict = {'aaaaaaa': 1}  
In [9]: %%timeit 
   ...: for _ in range(10000000): 
   ...:    val = test_dict['aaaaaaa'] 
   ...:                                                                                                                                                                                                 
582 ms ± 8.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

as opposed to

In [7]: %%timeit 
   ...: for _ in range(1000000):  
   ...:   x = set() 
   ...:    
   ...:  
   ...:                                                                                                                                                                                                 
122 ms ± 674 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %%time 
   ...: for i in range(1000000): 
   ...:     if (i % 1000) == 0: 
   ...:         print (i) 
   ...:
0
...                                                                                                                                                                                                 
999000
CPU times: user 118 ms, sys: 4.74 ms, total: 122 ms
Wall time: 121 ms

In [2]: %%timeit 
   ...: for _ in range(1000000):  
   ...:   x = {} 
   ...:  
   ...:                                                                                                                                                                                                 
51.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(i would really have expected the prints to take longer).

You do 4 dict lookups, 2 set creations, 2 dict creations, and a lookup in this simple example. This is on my system, not yours, but just there we have at least 3 seconds. I don't really know why it's taking that long, but since those are the big pieces, you could try those on your system and see what kind of performance you get.

Some things that are different from e.g. C is that variables are always referenced by name, and python has to look them up by name every time. Even if we don't get an item, just finding the d2 object takes python some time:

In [15]: %%timeit 
    ...: for _ in range(10000000): 
    ...:    val = d2 
    ...:                                                                                                                                                                                                
368 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So, finding d2 is ~2/3s of the time, and actually doing the item lookup is 1/3 the time or so.

There are probably a lot of other places like this, but scripting is just different than compiled programming.

Upvotes: 4

Related Questions