convert dataframe to a nested dictionary without column names

Question

This is a sample of my pandas dataframe and it contains close to 100k rows

import pandas as pd
df = pd.DataFrame({'cluster': ['5', '5', '5', '5', '5', '5'],
         'mdse_item_i': ['23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102'],
         'predPriceQty': ['35.675543',
                         '33.236678',
                         '35.675543',
                         '35.675543',
                         '35.675543',
                         '35.675543'],
         'schedule_i': ['56', '56', '56', '56', '56', '56'],
         'segment_id': ['4123', '4123', '4144', '4161', '4295', '4454'],
         'wk': ['1', '2', '1', '1', '1', '1']} )

segment_id	cluster	schedule_i	mdse_item_i	wk	predPriceQty
4123	5	56	23627102	1	35.675543
4123	5	56	23627102	2	33.236678
4144	5	56	23627102	1	35.675543
4161	5	56	23627102	1	35.675543
4295	5	56	23627102	1	35.675543
4454	5	56	23627102	1	35.675543

Below is the nested format of the dictionary i'm looking to achieve

{(4123, 5): {56.0: {23627102.0: {1: 35.6755430505491, 2:33.236678}}},
 (4144, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4161, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4295, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4454, 5): {56.0: {23627102.0: {1: 35.6755430505491}}}}

The code below works for me but for huge dataframes, it takes hours to create the dictionary and i'm trying to avoid row-wise iteration

forecast_dict_all = {}
for _, row in df.iterrows():
        item_agg_id = int(row[segment_id])
        mdse_item_i = row["mdse_item_i"]
        cluster = int(row["cluster"])
        wk = int(row["wk"])
        forecast = float(row["predPriceQty"])
        schedule_id = row["schedule_i"]
        
        if (item_agg_id, cluster) not in forecast_dict_all:
            forecast_dict_all[item_agg_id, cluster] = {
                schedule_id: {mdse_item_i: {wk: forecast}}
            }

My solution so far

dict(df.groupby(['segment_id','cluster'],as_index=False).apply(lambda x: x.to_dict()).to_dict())

df.set_index(['segment_id', 'cluster'], inplace=True)
    
di = df.to_dict(orient='index')
    
forecast_dict_all = {k:{v['schedule_i']: {v['mdse_item_i']: {v['wk']: v['predPriceQty']}}} 
                            for k,v in di.items()}

df.set_index(['segment_id', 'cluster'], inplace=True)
{k:{grp['schedule_i']: {grp['mdse_item_i']: {grp['wk']: grp['predPriceQty']}}}
for k, grp in df.groupby(['schedule_i','mdse_item_i','wk','predPriceQty'])}

I even tried using zipping but in both the instances, i could not achieve the desired output.

Edit I am using python: 2.7.13.final.0 pandas: 0.20.1

Any help is appreciated, thanks

convert dataframe to a nested dictionary without column names

Answers (1)

Related Questions