Reputation: 31
I have such a list
[9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}], 9306, '98.89', [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]
I'd like to transform it into a data frame that looks like:
ID Total Value Product IDs
9308 127.05 4204
9306 98.89 4200
etc.
also some IDs could have few product ids, so the list should look like:
ID Total Value Product IDs
9308 127.05 4204
9308 127.05 4200
9308 127.05 5555
Can anyone help me? I'm a very beginner in Python.
Upvotes: 1
Views: 106
Reputation: 585
The assumption is that the array has a structure of ID, Total Value and product info that is encoded as json. If we loop through that array three items at a time, then the below should work fine:
def unzip(i,v,d):
return pd.DataFrame(d).assign(ID=i, TotalValue=v )
df = pd.concat([unzip(i,v,d) for i, v, d in zip(*[iter(js)]*3)])
df[['ID','TotalValue', 'product_id']]
Example output:
Notes: Details on how to iterate multiple items at a time. When working with json, I find it easier to create a dataframe first, and then add additional columns that are broadcasted to all dataframe rows.
Assumption: input string was missing closing "]" otherwise it is not an array. The above code works with this input.
js = [ 9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}],
9306, '98.89', [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]]
Upvotes: 1
Reputation: 13407
You can use the grouper
recipe from itertools
to easily do this. https://docs.python.org/3/library/itertools.html#itertools-recipes
# Copied from itertools recipes link above
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
Then you use that to iterate through your data in chunks of 3, and combine it into a defaultdict
to build your data into the correct format for a DataFrame
import collections
clean_data = collections.defaultdict(list)
for entry_id, total_value, product_json in grouper(data, 3):
for record in product_json:
clean_data["id"].append(entry_id)
clean_data["total_value"].append(total_value)
clean_data["product_id"].append(record["product_id"])
df = pd.DataFrame(clean_data)
print(df)
id total_value product_id
0 9308 127.05 4204
1 9306 98.89 4200
This will also handle the case where you have more than 1 record in the 3rd of your data (e.g. if you have 2 dictionaries in your list, instead of just 1)
Upvotes: 1
Reputation: 153460
IIUC, and your input list is really structured well, then you can use some hard coding like this:
df_out = pd.concat([pd.DataFrame(l[::3], columns=['ID']),
pd.DataFrame(l[1::3], columns=['Total Value']),
pd.concat([pd.DataFrame(i) for i in l[2::3]], ignore_index=True)
.loc[:, 'product_id']], axis=1)
Output:
ID Total Value product_id
0 9308 127.05 4204
1 9306 98.89 4200
Upvotes: 1