Peggy
Peggy

Reputation: 143

Python - How to efficiently iterate through the subsets of a dictionary?

I have a dictionary with 500 DateFrames in it. Each data frame has columns 'date' , 'num_patients'. I apply the model to all the data frames in the dictionary, but Python kernel crash due to large data in the dictionary.

prediction_all = {}
for key, value in dict.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_all[key] = forecast.tail()

So, then I've subsetted the dictionary and applied the model to each subset.

dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
prediction_dict1 = {}
for key, value in dict1.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict1[key] = forecast.tail()

dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}
prediction_dict2 = {}
for key, value in dict2.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict2[key] = forecast.tail()

But I will need to run the code above for 10 times since I have 500 DataFrames (10 subsets). Is there a more efficient way to do this?

Upvotes: 2

Views: 1002

Answers (1)

Raymond Hettinger
Raymond Hettinger

Reputation: 226336

One immediate improvement is to drop the sorted() and slicing step and replace it with heapq.nsmallest() which will do many fewer comparisons. Also, the .keys() is not necessary since dicts automatically iterate over their keys by default.

Replace:

 dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
 dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}

With:

 lowest_keys = heapq.nsmallest(100, dict)
 dict1 = {k : dict[k] for k in lowest_keys[:50]}
 dict2 = {k : dict[k] for k in lowest_keys[50:100]}

The big for-loop in your code looks to only need .values() instead of .items() since key doesn't seem to be used.

Upvotes: 3

Related Questions