kubri81
kubri81

Reputation: 31

How to transform a complicated list to dataframe in Python

I have such a list

[9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}], 9306, '98.89', [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]

I'd like to transform it into a data frame that looks like:

ID   Total Value     Product IDs
9308   127.05        4204
9306   98.89         4200
etc.

also some IDs could have few product ids, so the list should look like:

ID   Total Value     Product IDs
9308   127.05        4204
9308   127.05        4200
9308   127.05        5555

Can anyone help me? I'm a very beginner in Python.

Upvotes: 1

Views: 106

Answers (3)

jedi
jedi

Reputation: 585

The assumption is that the array has a structure of ID, Total Value and product info that is encoded as json. If we loop through that array three items at a time, then the below should work fine:

def unzip(i,v,d):
    return pd.DataFrame(d).assign(ID=i, TotalValue=v )
df = pd.concat([unzip(i,v,d) for i, v, d in zip(*[iter(js)]*3)])
df[['ID','TotalValue', 'product_id']]

Example output:

enter image description here

Notes: Details on how to iterate multiple items at a time. When working with json, I find it easier to create a dataframe first, and then add additional columns that are broadcasted to all dataframe rows.

Assumption: input string was missing closing "]" otherwise it is not an array. The above code works with this input.

js = [ 9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}], 
        9306, '98.89',  [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]]

Upvotes: 1

Cameron Riddell
Cameron Riddell

Reputation: 13407

You can use the grouper recipe from itertools to easily do this. https://docs.python.org/3/library/itertools.html#itertools-recipes

# Copied from itertools recipes link above
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Then you use that to iterate through your data in chunks of 3, and combine it into a defaultdict to build your data into the correct format for a DataFrame

import collections

clean_data = collections.defaultdict(list)

for entry_id, total_value, product_json in grouper(data, 3):
    for record in product_json:
        clean_data["id"].append(entry_id)
        clean_data["total_value"].append(total_value)
        clean_data["product_id"].append(record["product_id"])
    
df = pd.DataFrame(clean_data)
print(df)
     id total_value  product_id
0  9308      127.05        4204
1  9306       98.89        4200

This will also handle the case where you have more than 1 record in the 3rd of your data (e.g. if you have 2 dictionaries in your list, instead of just 1)

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

IIUC, and your input list is really structured well, then you can use some hard coding like this:

df_out = pd.concat([pd.DataFrame(l[::3], columns=['ID']), 
                    pd.DataFrame(l[1::3], columns=['Total Value']), 
                    pd.concat([pd.DataFrame(i) for i in l[2::3]], ignore_index=True)
                      .loc[:, 'product_id']], axis=1)

Output:

     ID Total Value  product_id
0  9308      127.05        4204
1  9306       98.89        4200

Upvotes: 1

Related Questions