Peggy
Peggy

Reputation: 143

Python - How to improve performance of subsetting tuple key dictionaries through for loop?

I have two dictionaries with tuple keys. I subsetted each dictionary into 3 small dictionaries and then applied the prediction model to each dictionary. See below,

# Subset dictionaries from df_dict140 and df_dict150
# df_dict140 has 456 rows and df_dict150 has 415
# I subsetted the them by 200
dic1 = {k: df_dic[k] for k in df_dict140[:200]}
dic2 = {k: df_dic[k] for k in df_dict140[200:400]}
dic3 = {k: df_dic[k] for k in df_dict140[400:456]}

dic4 = {k: df_dic[k] for k in df_dict150[:200]}
dic5 = {k: df_dic[k] for k in df_dict150[200:400]}
dic6 = {k: df_dic[k] for k in df_dict150[400:415]}

Applied the model to each dictionary,

def predictionModel(pred_dict): 
    prediction = {}
    for (key1, key2), value in pred_dict.items():
        m = Prophet().fit(value)
        future = m.make_future_dataframe(periods = 365)
        forecast = m.predict(future)
        prediction[key2] = forecast[['ds','yhat']].tail()    
    return prediction 

Prediction results,

prediction1 = predictionModel(dic1)
prediction2 = predictionModel(dic2)
prediction3 = predictionModel(dic3)
prediction4 = predictionModel(dic4)
prediction5 = predictionModel(dic5)
prediction6 = predictionModel(dic6)

Is it possible to write a function or for loop to do the above work so that no need to subset the dictionaries twice and can get the results by once.

Upvotes: 0

Views: 112

Answers (2)

BallpointBen
BallpointBen

Reputation: 13750

# Note that Python will let you index beyond the end of lists without issue
smaller_dicts = [{k: df_dic[k] for k in dfd[200*i:200*(i+1)]}
                 for dfd in [df_dict140, df_dict150]
                 for i in range(3)]

predictions = [predictionModel(sd) for sd in smaller_dicts]

Upvotes: 1

AShelly
AShelly

Reputation: 35540

I still don't understand what you are doing with Prophet, but if you just want to slice the dictionaries programmatically:

for i in range(0,max(len(dict140),len(dict150)),200):
    d140_slice = {k: df_dic[k] for k in dict140[i:i+200]}
    d150_slice = {k: df_dic[k] for k in dict150[i:i+200]}
    #do something with the slices.

Upvotes: 0

Related Questions