Aidis
Aidis

Reputation: 1280

"Expanding" pandas dataframe by using cell-contained list

I have a dataframe in which third column is a list:

import pandas as pd 
pd.DataFrame([[1,2,['a','b','c']]])

I would like to separate that nest and create more rows with identical values of first and second column. The end result should be something like:

pd.DataFrame([[[1,2,'a']],[[1,2,'b']],[[1,2,'c']]])

Note, this is simplified example. In reality I have multiple rows that I would like to "expand".

Regarding my progress, I have no idea how to solve this. Well, I imagine that I could take each member of nested list while having other column values in mind. Then I would use the list comprehension to make more list. I would continue so by and add many lists to create a new dataframe... But this seems just a bit too complex. What about simpler solution?

Upvotes: 1

Views: 762

Answers (2)

Agoston T
Agoston T

Reputation: 173

Not exactly the same issue that the OR described, but related - and more pandas-like - is the situation where you have a dict of lists with lists of unequal lengths. In that case, you can create a DataFrame like this in long format.

import pandas as pd

my_dict = {'a': [1,2,3,4], 'b': [2,3]}
df = pd.DataFrame.from_dict(my_dict, orient='index')
df = df.unstack() # to format it in long form
df = df.dropna() # to drop nan values which were generated by having lists of unequal length 
df.index = df.index.droplevel(level=0) # if you don't want to store the index in the list 
# NOTE this last step results duplicate indexes

Upvotes: 1

bananafish
bananafish

Reputation: 2917

Create the dataframe with a single column, then add columns with constant values:

import pandas as pd

df = pd.DataFrame({"data": ['a', 'b', 'c']})
df['col1'] = 1
df['col2'] = 2
print df

This prints:

  data  col1  col2
0    a     1     2
1    b     1     2
2    c     1     2

Upvotes: 2

Related Questions