Reputation: 171
I have a dataframe and want to convert it to a list of dictionaries. I use read_csv()
to create this dataframe. The dataframe looks like the following:
AccountName AccountType StockName Allocation
0 MN001 #1 ABC 0.4
1 MN001 #1 ABD 0.6
2 MN002 #2 EFG 0.5
3 MN002 #2 HIJ 0.4
4 MN002 #2 LMN 0.1
The desired output:
[{'ABC':0.4, 'ABD':0.6}, {'EFG':0.5, 'HIJ':0.4,'LMN':0.1}]
I have tried to research on similar topics and used the Dataframe.to_dict()
function. I look forward to getting this done. Many thanks for your help!
Upvotes: 0
Views: 1924
Reputation: 4855
This should do it:
portfolios = []
for _, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append(portfolio)
First use the groupby()
function to group the rows of the dataframe by AccountName
. To access the individual rows (stocks) for each account, you use the iterrows()
method. As user @ebb-earl-co explained in the comments, the _
is there as a placeholder variable, because iterrows()
returns (index, Series
) tuples, and we only need the Series
(the rows themselves). From there, use a dict comprehension to create a dictionary mapping StockName
-> Allocation
for each stock. Finally, append that dictionary to the list of portfolios
, resulting in the expected output:
[{'ABC': 0.4, 'ABD': 0.6}, {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}]
One more thing: if you decide later that you want to label each dict in the portfolios
with the account name, you could do it like this:
portfolios = []
for acct_name, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append({acct_name: portfolio})
This will return a list of nested dicts like this:
[{'MN001': {'ABC': 0.4, 'ABD': 0.6}},
{'MN002': {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}}]
Note that in this case, I used the variable acct_name
instead of assigning to _
because we actually will use the index to "label" the dicts in the portfolios
list.
Upvotes: 3
Reputation: 39830
import pandas as pd
import numpy as np
d = np.array([['MN001','#1','ABC', 0.4],
['MN001','#1','ABD', 0.6],
['MN002', '#2', 'EFG', 0.5],
['MN002', '#2', 'HIJ', 0.4],
['MN002', '#2', 'LMN', 0.1]])
df = pd.DataFrame(data=d, columns = ['AccountName','AccountType','StockName', 'Allocation'])
by_account_df = df.groupby('AccountName').apply(lambda x : dict(zip(x['StockName'],x['Allocation']))).reset_index(name='dic'))
by_account_lst = by_account_df['dic'].values.tolist()
And the result should be:
print(by_account_lst)
[{'ABC': '0.4', 'ABD': '0.6'}, {'EFG': '0.5', 'HIJ': '0.4', 'LMN': '0.1'}]
Upvotes: 4