programmer pro
programmer pro

Reputation: 179

How can i iterate over a large list quickly?

I am trying to iterate over a large list. I want a method that can iterate this list quickly. But it takes much time to iterate. Is there any method to iterate quickly or python is not built to do this. My code snippet is :-

for i in THREE_INDEX:
    if check_balanced(rc, pc):
        print('balanced')
    else:
        rc, pc = equation_suffix(rc, pc, i) 

Here THREE_INDEX has a length of 117649. It takes much time to iterate over this list, is there any method to iterate it quicker. But it takes around 4-5 minutes to iterate

equation_suffix functions:

def equation_suffix(rn, pn,  suffix_list):
    len_rn = len(rn)
    react_suffix = suffix_list[: len_rn]
    prod_suffix = suffix_list[len_rn:]
    for re in enumerate(rn):
        rn[re[0]] = add_suffix(re[1], react_suffix[re[0]])
    for pe in enumerate(pn):
        pn[pe[0]] = add_suffix(pe[1], prod_suffix[pe[0]])
    return rn, pn

check_balanced function:

def check_balanced(rl, pl):
    total_reactant = []
    total_product = []
    reactant_name = []
    product_name = []
    for reactant in rl:
        total_reactant.append(separate_num(separate_brackets(reactant)))
    for product in pl:
        total_product.append(separate_num(separate_brackets(product)))
    for react in total_reactant:
        for key in react:
            val = react.get(key)
            val_dict = {key: val}
            reactant_name.append(val_dict)
    for prod in total_product:
        for key in prod:
            val = prod.get(key)
            val_dict = {key: val}
            product_name.append(val_dict)

    reactant_name = flatten_dict(reactant_name)
    product_name = flatten_dict(product_name)

    for elem in enumerate(reactant_name):
        val_r = reactant_name.get(elem[1])
        val_p = product_name.get(elem[1])
        if val_r == val_p:
            if elem[0] == len(reactant_name) - 1:
                return True
        else:
            return False

Upvotes: 2

Views: 4113

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155363

General for loops aren't an issue, but using them to build (or rebuild) lists is usually slower than using list comprehensions (or in some cases, map/filter, though those are advanced tools that are often a pessimization).

Your functions could be made significantly simpler this way, and they'd get faster to boot. Example rewrites:

def equation_suffix(rn, pn, suffix_list):
    prod_suffix = suffix_list[len(rn):]
    # Change `rn =` to `rn[:] = ` if you must modify the caller's list as in your
    # original code, not just return the modified list (which would be fine in your original code)
    rn = [add_suffix(r, suffix) for r, suffix in zip(rn, suffix_list)]  # No need to slice suffix_list; zip'll stop when rn exhausted
    pn = [add_suffix(p, suffix) for p, suffix in zip(pn, prod_suffix)]
    return rn, pn

def check_balanced(rl, pl):
    # These can be generator expressions, since they're iterated once and thrown away anyway
    total_reactant = (separate_num(separate_brackets(reactant)) for reactant in rl)
    total_product = (separate_num(separate_brackets(product)) for product in pl)
    reactant_name = []
    product_name = []
    # Use .items() to avoid repeated lookups, and concat simple listcomps to reduce calls to append
    for react in total_reactant:
        reactant_name += [{key: val} for key, val in react.items()]
    for prod in total_product:
        product_name += [{key: val} for key, val in prod.items()]

    # These calls are suspicious, and may indicate optimizations to be had on prior lines
    reactant_name = flatten_dict(reactant_name)
    product_name = flatten_dict(product_name)

    for i, (elem, val_r) in enumerate(reactant_name.items()):
        if val_r == product_name.get(elem):
            if i == len(reactant_name) - 1:
                return True
        else:
            # I'm a little suspicious of returning False the first time a single
            # key's value doesn't match. Either it's wrong, or it indicates an
            # opportunity to write short-circuiting code that doesn't have
            # to fully construct reactant_name and product_name when much of the time
            # there will be an early mismatch
            return False

I'll also note that using enumerate without unpacking the result is going to get worse performance, and more cryptic code; in this case (and many others), enumerate isn't needed, as listcomps and genexprs can accomplish the same result without knowing the index, but when it is needed, always unpack, e.g. for i, elem in enumerate(...): then using i and elem separately will always run faster than for packed in enumerate(...): and using packed[0] and packed[1] (and if you have more useful names than i and elem, it'll be much more readable to boot).

Upvotes: 1

RonZhang724
RonZhang724

Reputation: 575

I believe the reason why "iterating" the list take a long time is due to the methods you are calling inside the for loop. I took out the methods just to test the speed of the iteration, it appears that iterating through a list of size 117649 is very fast. Here is my test script:

import time

start_time = time.time()
new_list = [(1, 2, 3) for i in range(117649)]
end_time = time.time()
print(f"Creating the list took: {end_time - start_time}s")

start_time = time.time()
for i in new_list:
    pass
end_time = time.time()
print(f"Iterating the list took: {end_time - start_time}s")

Output is:

Creating the list took: 0.005337953567504883s
Iterating the list took: 0.0035648345947265625s

Edit: time() returns second.

Upvotes: 2

Related Questions