Reputation: 387
I want to convert this dataframe to dictionary,
import pandas as pd
df = pd.DataFrame({'Date': {0: 44197,1: 44197,2: 44197,3: 44197,4: 44198,5: 44198,6: 44198,7: 44198,8: 44197,9: 44197,10: 44197,11: 44197,12: 44198,13: 44198,14: 44198,15: 44198},
'Product': {0: 'B',1: 'A',2: 'C',3: 'D',4: 'B',5: 'A',6: 'C',7: 'D',8: 'B',9: 'A',10: 'C',11: 'D',12: 'B',13: 'A',14: 'C',15: 'D'},
'Seller': {0: 'XXX',1: 'XXX',2: 'XXX',3: 'XXX',4: 'XXX',5: 'XXX',6: 'XXX',7: 'XXX',8: 'YYY',9: 'YYY',10: 'YYY',11: 'YYY',12: 'YYY',13: 'YYY',14: 'YYY',15: 'YYY'},
'Price': {0: 10,1: 25,2: 36,3: 14,4: 60,5: 31,6: 2,7: 7,8: 5,9: 9,10: 10,11: 26,12: 36,13: 78,14: 95,15: 100}})
Here is the dataframe look like,
I want to create dict key based on Seller then for every date groupe products as key so A is key and Price is item
Order is Seller => Date (as the will be 1000 dates) => for each date you will have 4 products and thier corrsponding prices.
Does anyone know how to do this ? My true table is very large 100K lines, my goal is to make this as much fast as possible also.
My results I want to have will look like this:
{XXX:{"01/01/2021":{ A : Value1 , B : Value2 , C : Value3 , D : Value4 },"02/01/2021":{ .......}}, "YYY" : ....... }
Upvotes: 4
Views: 506
Reputation: 45241
First, set your desired nested index to a multi index using set_index:
>>> df.set_index(["Seller", "Date", "Product"])
Price
Seller Date Product
XXX 44197 B 10
A 25
C 36
D 14
44198 B 60
A 31
C 2
D 7
YYY 44197 B 5
A 9
C 10
D 26
44198 B 36
A 78
C 95
D 100
Then convert the multi-index to nested dictionaries using the to_dict method. Each column is a key in the dict, and multi-index are converted to keys, each of which is a tuple:
>>> df_transformed = df.set_index(["Seller", "Date", "Product"]).to_dict()
>>> df_transformed
{'Price': {('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}}
...capture the Price column by itself (but in this case it is of course the only column):
>>> price_dict = df_transformed.to_dict()["Price"]
>>> price_dict
{('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}
See this answer from Martijn Pieters for help with turning the tuple keys into nested dictionaries. I will just regurgitate that below for convenience, but it really is Pieters' solution:
def nest(d: dict) -> dict:
result = {}
for key, value in d.items():
target = result
for k in key[:-1]: # traverse all keys but the last
target = target.setdefault(k, {})
target[key[-1]] = value
return result
Finally:
>>> nest(price_dict)
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14}, 44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}}, 'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26}, 44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}
The OP mentions in a comment that the order of the dates matters. It is important to note that the order of the multi-index of the df_transformed
will determine the order of the final nested dictionary. If you want to sort it in different ways before turning it into the dictionary, use .sort_index().
Upvotes: 1
Reputation: 71689
Reshape the dataframe with pivot
then create temporary column and assign the records, finally unstack
on level=0
and use to_dict
to create a dictionary
s = df.pivot(['Seller', 'Date'], 'Product', 'Price')
s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict()
{'XXX': {44197: {'A': 25, 'B': 10, 'C': 36, 'D': 14},
44198: {'A': 31, 'B': 60, 'C': 2, 'D': 7}},
'YYY': {44197: {'A': 9, 'B': 5, 'C': 10, 'D': 26},
44198: {'A': 78, 'B': 36, 'C': 95, 'D': 100}}}
Upvotes: 1
Reputation: 3066
groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict()).apply(lambda x: x['Price'])
desired_dict = {level: groupped_df.xs(level).to_dict() for level in groupped_df.index.levels[0]}
output:
Out[100]:
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14},
44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}},
'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26},
44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}
On the first line we are grouping and creating a series where seller and date are indices and values is a dictionary as desired (the most nested one). This is what happens on groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict())
, but since it creates some extra "data" inside the dictionary, I also added .apply(lambda x: x['Price'])
to select only the price.
After the first line the groupped_df
looks like this:
Out[101]:
Seller Date
XXX 44197 {'B': 10, 'A': 25, 'C': 36, 'D': 14}
44198 {'B': 60, 'A': 31, 'C': 2, 'D': 7}
YYY 44197 {'B': 5, 'A': 9, 'C': 10, 'D': 26}
44198 {'B': 36, 'A': 78, 'C': 95, 'D': 100}
The second line shift the multi index into a nested dictionary, like in the this answer (notice no need to 'index' arg in to_dict
)
Upvotes: 1