Reputation: 105
I have a dataframe which looks something like this:
ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]
df = pd.DataFrame(ex)
A word explode1 explode2
Germany Danke [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
French merci [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
English Thank you [[berlin, frankfurt], [Paris], [London, Glassgow]] [[food, cars], [fashion], [pound, cricket]]
I want to explode the dataframe based on the columns "explode1" and "explode2". If you notice they have the same number of list items ie:3, but the list contains all the items for the rows. So I want to explode the dataframe to look something like this:
A word explode1 explode2
Germany Danke [berlin] [food]
Germany Danke [frankfurt] [cars]
French merci [Paris] [fashion]
English Thank you [London] [pound]
English Thank you [Glassgow] [cricket]
How do I do this? Was looking at the pandas.explode option but how we handle the complete list in every row because for eg: I only want the first row to expand based on the first item of the list in the "explode1" and "explode2" column not on the others items in the list.
Upvotes: 1
Views: 216
Reputation: 9619
explode
won't really work since the list items have to merged by index first. A possible workflow it to do this first, then explode on to_explode
, finally creating a new df using concat
:
import pandas as pd
ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]},
{"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]
df = pd.DataFrame(ex)
df['to_explode'] = df.apply(lambda row: list(zip(row['explode1'][row.name], row['explode2'][row.name])), axis=1)
df = df[['A', 'word', 'to_explode']].explode('to_explode').reset_index(drop=True)
result_df = pd.concat([df[['A', 'word']], pd.DataFrame(df['to_explode'].values.tolist(), columns=['explode1', 'explode2'])], axis=1)
Result:
A | word | explode1 | explode2 | |
---|---|---|---|---|
0 | Germany | Danke | berlin | food |
1 | Germany | Danke | frankfurt | cars |
2 | French | merci | Paris | fashion |
3 | English | Thank you | London | pound |
4 | English | Thank you | Glassgow | cricket |
Upvotes: 1