user9910379
user9910379

Reputation: 205

Make dictionary with specific format from pandas dataframe

Trying to convert the dataframe

enter image description here

Into a dictionary with this specific structure:

    sales = { 
   "clients": [
       {"ID_client": "241341", 
       "purchases": [
            "Item 101",
            "Item 202",
            "Item 324",
        ],
        "payment": [
            "visa", "master", "visa"
        ]           
       },
       {"ID_client": "24356", 
       "purchases": [
            "Item 2320",
            "Item 2342",
            "Item 5604",
        ],
        "payment": [
            "diners", "cash", "diners"
        ]           
       },    
       {"ID_client": "5534", 
       "purchases": [
            "Item 50563",
            "Item 52878",
            "Item 54233",
        ],
        "payment": [
            "diners", "master", "visa"
        ]           
       }       
   ]

}

I've been trying with some for loops like:

 d = {"sales":[]}
  for i in df1['ID_Client'].unique():/
     clients = {"ID_client": df1['ID_client'][i]}
     d[i] = [{df1['purchases'][j]: df1['payment'][j]} for j in 
     df1[df1['ID_Client']==i].index]

Any help will be greatly appreciated. Thanks in advance.

Upvotes: 0

Views: 579

Answers (3)

Valdi_Bo
Valdi_Bo

Reputation: 30991

Assuming that your source DataFrame is df, your dictionary can be generated with a single instruction:

sales = { 'clients' : df.groupby(df.ID_client)\
    .agg(list)\
    .reset_index()\
    .set_axis(['ID_client', 'purchases', 'payment'], axis=1, inplace=False)\
    .to_dict('records')
}

As you can see, creation of the array - content of clients consists actually of the following chain:

  • groupby - group df by ID_client,
  • agg - aggregate all the remaining columns (Purchase and Payment), into lists of values, for now ID_client is the index,
  • reset_index to make ID_client an ordinary data column,
  • set_axis - change column names (the names of target components are slightly different from source column names),
  • to_dict - create the result dictionary.

The content of sales variable is:

{'clients': [{'ID_client': '241341',
   'purchases': ['Item 101', 'Item 202', 'Item 324'],
   'payment': ['visa', 'master', 'visa']},
  {'ID_client': '24356',
   'purchases': ['Item 2320', 'Item 2342', 'Item 5604'],
   'payment': ['diners', 'cash', 'diners']},
  {'ID_client': '5534',
   'purchases': ['Item 50563', 'Item 52878', 'Item 54233'],
   'payment': ['diners', 'master', 'visa']}]}

just as you wish.

Upvotes: 1

WisdomWolf
WisdomWolf

Reputation: 21

I think you were on the right track with the start of your loop, but you'll be better off using the results to filter the dataframe. Should be more efficient and more panda like (pandaren?) to do it that way.

Example:

clients = []
for c in df1['ID_client'].unique():
    client = {
        'ID_client': c, 
        'purchases': list(df1[df1.ID_client == c].loc[:, 'Purchase']), 
        'payment': list(df1[df1.ID_client == c].loc[:, 'Payment'])
    }
    clients.append(client)

Upvotes: 1

jpp
jpp

Reputation: 164693

Here's one way using np.repeat and itertools.chain:

import pandas as pd, numpy as np
from itertools import chain

df = pd.DataFrame(sales['clients'])

res = pd.DataFrame({'ID_client': np.repeat(df['ID_client'], df['payment'].map(len)),
                    'payment': list(chain.from_iterable(df['payment'])),
                    'purchases': list(chain.from_iterable(df['purchases']))})

print(res)

  ID_client payment   purchases
0    241341    visa    Item 101
0    241341  master    Item 202
0    241341    visa    Item 324
1     24356  diners   Item 2320
1     24356    cash   Item 2342
1     24356  diners   Item 5604
2      5534  diners  Item 50563
2      5534  master  Item 52878
2      5534    visa  Item 54233

Note with this method each unique index aligns with an ID_client, as per your input.

Upvotes: 1

Related Questions