Sebastian Goslin
Sebastian Goslin

Reputation: 497

Have a function wait until it returns 'n' number of results

I need to incorporate threading into my code due to the limits of my database. My problem is I have a list of dictionaries (approx ~ 850 elements) and a list of elements (same length), and I can only query 50 of them at a time. So I use a generator to split the lists into chunks of 50.

def list_split(ls):
    n = 50
    for i in range(0, len(ls), n):
        yield ls[i:i + n]

I then pass both of these lists into a function that that essentially appends them each to a new dictionary, The value for each dictionary will be the query which takes approximately 2 seconds per query.

def query(ls1, ls2):

count = 0
query_return_dict = {}

for i, j in zip(ls2, ls1):
    for key, value in zip(i, j):
        # ret = token.query(j) replace 'value' with 'ret' once ready to run
        query_return_dict[key] = value
        count += 1

print(query_return_dict)
return query_return_dict

I then call them:

ls1 = list_split(unchunked_ls1)
ls2 = list_split(unchunked_ls2)

Now this is where I'm not understanding 'single' threading with this code block:

def main():
    thread = threading.Thread(target=query, args=(ls1, ls2))
    thread.start()

    thread.join()

if __name__ == '__main__':
    main()

I'm learning about threading via this site, but I don't know if its doing what I intend to do, I'm just really hesitant to actually run this on our database for risk of backing it up by flooding it with queries.

TL;DR,

I need to make sure that def query(ls1, ls2): will only start to run again once 50 queries from ls1 (which is the list of dictionaries) have been returned and appended to query_return_dict, then it can run then next chunk of 50, till all elements in the query list have been queried.

ALSO:

If there is a better way to do this then threading that would be awesome too!

As requested, here is what the format for the two lists would look like, keep in mind there are approximately 850 of them:

ls1 = ['34KWR','23SDG','903SD','256DF','41SDA','42DFS',...] <- len 850
ls2 = [{"ity": {"type": "IDE", "butes": [{"ity": {"id": "abc34"}}], "limit": 20}}, ...] <- len 850

Upvotes: 1

Views: 104

Answers (1)

chepner
chepner

Reputation: 530912

It's simpler if you zip first, then chunk. Also, let islice get one chunk at a time.

from itertools import islice


pairs = zip(unchunked_ls1, unchunked_ls2)

# Get the next 50 elements of pairs and return as a list.
# Because pairs is an iterator, not a list, the return value
# of islice changes each time you call it.
def get_query():
    return list(islice(pairs, 50))

# Repeatedly call get_query until it returns an empty list
for query in iter(get_query, []):
    # do your query
    ...

Upvotes: 1

Related Questions