Gogo78
Gogo78

Reputation: 387

Aggregate Dataframe to Nested Dictionaries (python)

I want to convert this dataframe to dictionary,

import pandas as pd
df = pd.DataFrame({'Date': {0: 44197,1: 44197,2: 44197,3: 44197,4: 44198,5: 44198,6: 44198,7: 44198,8: 44197,9: 44197,10: 44197,11: 44197,12: 44198,13: 44198,14: 44198,15: 44198},
 'Product': {0: 'B',1: 'A',2: 'C',3: 'D',4: 'B',5: 'A',6: 'C',7: 'D',8: 'B',9: 'A',10: 'C',11: 'D',12: 'B',13: 'A',14: 'C',15: 'D'},
 'Seller': {0: 'XXX',1: 'XXX',2: 'XXX',3: 'XXX',4: 'XXX',5: 'XXX',6: 'XXX',7: 'XXX',8: 'YYY',9: 'YYY',10: 'YYY',11: 'YYY',12: 'YYY',13: 'YYY',14: 'YYY',15: 'YYY'},
 'Price': {0: 10,1: 25,2: 36,3: 14,4: 60,5: 31,6: 2,7: 7,8: 5,9: 9,10: 10,11: 26,12: 36,13: 78,14: 95,15: 100}})

Here is the dataframe look like,

enter image description here

I want to create dict key based on Seller then for every date groupe products as key so A is key and Price is item

Order is Seller => Date (as the will be 1000 dates) => for each date you will have 4 products and thier corrsponding prices.

Does anyone know how to do this ? My true table is very large 100K lines, my goal is to make this as much fast as possible also.

My results I want to have will look like this:

{XXX:{"01/01/2021":{ A : Value1 , B : Value2 , C : Value3 , D : Value4 },"02/01/2021":{ .......}}, "YYY" : ....... }

Upvotes: 4

Views: 506

Answers (3)

Rick
Rick

Reputation: 45241

First, set your desired nested index to a multi index using set_index:

>>> df.set_index(["Seller", "Date", "Product"])
                      Price
Seller Date  Product
XXX    44197 B           10
             A           25
             C           36
             D           14
       44198 B           60
             A           31
             C            2
             D            7
YYY    44197 B            5
             A            9
             C           10
             D           26
       44198 B           36
             A           78
             C           95
             D          100

Then convert the multi-index to nested dictionaries using the to_dict method. Each column is a key in the dict, and multi-index are converted to keys, each of which is a tuple:

>>> df_transformed = df.set_index(["Seller", "Date", "Product"]).to_dict()
>>> df_transformed
{'Price': {('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}}

...capture the Price column by itself (but in this case it is of course the only column):

>>> price_dict = df_transformed.to_dict()["Price"]
>>> price_dict
{('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}

See this answer from Martijn Pieters for help with turning the tuple keys into nested dictionaries. I will just regurgitate that below for convenience, but it really is Pieters' solution:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

Finally:

>>> nest(price_dict)
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14}, 44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}}, 'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26}, 44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}

The OP mentions in a comment that the order of the dates matters. It is important to note that the order of the multi-index of the df_transformed will determine the order of the final nested dictionary. If you want to sort it in different ways before turning it into the dictionary, use .sort_index().

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Reshape the dataframe with pivot then create temporary column and assign the records, finally unstack on level=0 and use to_dict to create a dictionary

s = df.pivot(['Seller', 'Date'], 'Product', 'Price')
s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict()

{'XXX': {44197: {'A': 25, 'B': 10, 'C': 36, 'D': 14},
         44198: {'A': 31, 'B': 60, 'C': 2, 'D': 7}},
 'YYY': {44197: {'A': 9, 'B': 5, 'C': 10, 'D': 26},
         44198: {'A': 78, 'B': 36, 'C': 95, 'D': 100}}}

Upvotes: 1

Roim
Roim

Reputation: 3066

groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict()).apply(lambda x: x['Price'])
desired_dict = {level: groupped_df.xs(level).to_dict() for level in groupped_df.index.levels[0]}

output:

Out[100]: 
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14},
  44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}},
 'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26},
  44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}

On the first line we are grouping and creating a series where seller and date are indices and values is a dictionary as desired (the most nested one). This is what happens on groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict()), but since it creates some extra "data" inside the dictionary, I also added .apply(lambda x: x['Price']) to select only the price.

After the first line the groupped_df looks like this:

Out[101]: 
Seller  Date 
XXX     44197     {'B': 10, 'A': 25, 'C': 36, 'D': 14}
        44198       {'B': 60, 'A': 31, 'C': 2, 'D': 7}
YYY     44197       {'B': 5, 'A': 9, 'C': 10, 'D': 26}
        44198    {'B': 36, 'A': 78, 'C': 95, 'D': 100}

The second line shift the multi index into a nested dictionary, like in the this answer (notice no need to 'index' arg in to_dict)

Upvotes: 1

Related Questions