Ele
Ele

Reputation: 553

Pandas Column (list) to columns and rows

i have a dataframe with WooCommerce orders. in this DataFrame I have an order id and the line items. the line items is a json list of items (with lists again), prices and quantities:

[
{u'sku': u'100111', u'total_tax': u'1.11', u'product_id': 4089, u'price': 15.878505, u'tax_class': u'reduced-rate', u'variation_id': 6627, u'taxes': [{u'total': u'1.111495', u'subtotal': u'1.111495', u'id': 35}], u'name': u'prod2', u'meta_data': [{u'value': u'100501', u'id': 74675, u'key': u'SKU'}], u'subtotal_tax': u'1.11', u'total': u'15.88', u'subtotal': u'15.88', u'id': 9956, u'quantity': 1}, 
{u'sku': u'100222', u'total_tax': u'2.29', u'product_id': 4081, u'price': 32.700935, u'tax_class': u'reduced-rate', u'variation_id': 6632, u'taxes': [{u'total': u'2.289065', u'subtotal': u'2.289065', u'id': 35}], u'name': u'prod1', u'meta_data': [{u'value': u'100302', u'id': 74685, u'key': u'SKU'}], u'subtotal_tax': u'2.29', u'total': u'32.70', u'subtotal': u'32.70', u'id': 9957, u'quantity': 1}
] 

I now need to transform all the items in the list to columns in the dataframe and also I need to make n lines (based on the number of lists in the list) out of this one liner.

do you guys have a smart idea?

Thanks! e.

//edit: this is my input:

id    line_items
1234  [{u'sku': u'100111'}, {u'sku': u'100222'}] 

my expected output would be

id, sku
1234, 100111
1234, 100222

Upvotes: 0

Views: 234

Answers (2)

GZ0
GZ0

Reputation: 4263

pandas.io.json.json_normalize can automatically unpack nested structures. Following is the code for your example.

from pandas.io.json import json_normalize

df = pd.DataFrame({"id": [1234], "line_items": [[{u'sku': u'100111'}, {u'sku': u'100222'}]]})

dict_df = df.to_dict(orient="records")
df = json_normalize(dict_df, record_path="line_items", meta=["id"])

The output is

      sku   id
0  100111  1234
1  100222  1234

You may need to reorder the columns of the output for your purpose.

Upvotes: 1

cs95
cs95

Reputation: 402523

You'll need to flatten the dictionaries into a new DataFrame. Here is an efficient comprehension you can use to do that:

pd.DataFrame(
    [{'id': Y, **x} for Y, X in zip(df['id'], df['line_items']) for x in X ])

     id     sku
0  1234  100111
1  1234  100222

This assumes "line_items" is a column containing a list of dictionaries. If it isn't (if it is a string), you can convert it first using

import ast
df['line_items'] = df['line_items'].map(ast.literal_eval)

Another alternative is with chaining:

from itertools import chain
from operator import itemgetter 

pd.DataFrame({
    'sku': list(
        map(itemgetter('sku'), chain.from_iterable(df['line_items'].tolist()))), 
    'id': df['id'].values.repeat(df['line_items'].str.len())})

      sku    id
0  100111  1234
1  100222  1234

Upvotes: 1

Related Questions