Dawn.Sahil
Dawn.Sahil

Reputation: 105

How to explode columns in a dataframe which has the complete list for every row which needs to be exploded in pandas?

I have a dataframe which looks something like this:

ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]

df = pd.DataFrame(ex)

A       word        explode1                                            explode2
Germany Danke       [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]
French  merci       [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]
English Thank you   [[berlin, frankfurt], [Paris], [London, Glassgow]]  [[food, cars], [fashion], [pound, cricket]]

I want to explode the dataframe based on the columns "explode1" and "explode2". If you notice they have the same number of list items ie:3, but the list contains all the items for the rows. So I want to explode the dataframe to look something like this:

A       word        explode1    explode2
Germany Danke       [berlin]    [food]
Germany Danke       [frankfurt] [cars]
French  merci       [Paris]     [fashion]
English Thank you   [London]    [pound]
English Thank you   [Glassgow]  [cricket]

How do I do this? Was looking at the pandas.explode option but how we handle the complete list in every row because for eg: I only want the first row to expand based on the first item of the list in the "explode1" and "explode2" column not on the others items in the list.

Upvotes: 1

Views: 216

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9619

explode won't really work since the list items have to merged by index first. A possible workflow it to do this first, then explode on to_explode, finally creating a new df using concat:

import pandas as pd 

ex = [{"A": "Germany", "word": "Danke", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "French", "word": "merci", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}, 
  {"A": "English", "word": "Thank you", "explode1": [["berlin", "frankfurt"], ["Paris"], ["London", "Glassgow"]], "explode2": [["food", "cars"], ["fashion"], ["pound", "cricket"]]}]

df = pd.DataFrame(ex)

df['to_explode'] = df.apply(lambda row: list(zip(row['explode1'][row.name], row['explode2'][row.name])), axis=1)
df = df[['A', 'word', 'to_explode']].explode('to_explode').reset_index(drop=True)

result_df = pd.concat([df[['A', 'word']], pd.DataFrame(df['to_explode'].values.tolist(), columns=['explode1', 'explode2'])], axis=1)

Result:

A word explode1 explode2
0 Germany Danke berlin food
1 Germany Danke frankfurt cars
2 French merci Paris fashion
3 English Thank you London pound
4 English Thank you Glassgow cricket

Upvotes: 1

Related Questions