Reputation: 173
I have a dataframe where a column contains uneven lists of tuples. The tuples will all be of same length, only the lists are uneven. I want to melt this column in-frame so that new columns will be appended to the existing one and rows will be duplicated. Like so:
df
name id list_of_tuples
0 john doe abc-123 [('cat',100,'xyz-123'),('cat',96,'uvw-456')]
1 bob smith def-456 [('dog',98,'rst-789'),('dog',97,'opq-123'),('dog',95,'lmn-123')]
2 bob parr ghi-789 [('tree',100,'ijk-123')]
df_new
name id val_1 val_2 val_3
0 john doe abc-123 cat 100 xyz-123
1 john doe abc-123 cat 96 uvw-456
2 bob smith def-456 dog 98 rst-789
3 bob smith def-456 dog 97 opq-123
4 violet parr def-456 dog 95 lmn-123
5 violet parr ghi-789 tree 100 ijk-123
For my current method I'm creating a new dataframe where I'm using the chain feature from itertools, but I want to get away from creating a whole other dataframe and merging it back on the 'id' column.
Here's my current code:
df_new = pd.DataFrame(list(chain.from_iterable(df.matches)),columns=['val_1','val_2','val_3']).reset_index(drop=True)
df_new['id'] = np.repeat(df.id.values, df['list_of_tuples'].str.len())
Upvotes: 1
Views: 936
Reputation: 153460
Update using pandas explode
in 0.25.3+
dfi = df.explode('list_of_tuples')
df1 = pd.DataFrame(dfi['list_of_tuples'].to_list(),
index=[dfi['name'], dfi['id']])\
.add_prefix('val_')
df1.reset_index()
Output:
name id val_0 val_1 val_2
0 john doe abc=123 cat 100 xyz-123
1 john doe abc=123 cat 96 uvw-456
2 bob smith def-456 dog 98 rst-789
3 bob smith def-456 dog 97 opq-123
4 bob smith def-456 dog 95 lmn-123
5 bob parr ghi-780 tree 100 ijk-123
Let use apply
with pd.Series
:
df.set_index('id').list_of_tuples #Set id as index and select list_of_tuples column
.apply(pd.Series) #apply pd.series to separate elements of list
.stack() #stack the elements vertically
.apply(pd.Series) #apply pd.Series to separate elements of tuples
.add_prefix('val_') #add prefix of val_ to all columns
.reset_index() #Reset index to move id back into frame as column
.drop('level_1', axis=1) #Drop not need level_1 column from stack
Output:
id val_0 val_1 val_2
0 abc-123 cat 100 xyz-123
1 abc-123 cat 96 uvw-456
2 def-456 dog 98 rst-789
3 def-456 dog 97 opq-123
4 def-456 dog 95 lmn-123
5 ghi-789 tree 100 ijk-123
Edited to handle question edit of adding 'name' to dataframe:
df.set_index(['name','id']).list_of_tuples
.apply(pd.Series)
.stack()
.apply(pd.Series)
.add_prefix('val_')
.reset_index(level=-1,drop=True)
.reset_index()
Output:
name id val_0 val_1 val_2
0 John Doe abc-123 cat 100 xyz-123
1 John Doe abc-123 cat 96 uvw-456
2 Bob Smith def-456 dog 98 rst-789
3 Bob Smith def-456 dog 97 opq-123
4 Bob Smith def-456 dog 95 lmn-123
5 Bob Parr ghi-789 tree 100 ijk-123
Upvotes: 2
Reputation: 323226
unnest your list then we do concat
s=df.list_of_tuples
pd.concat([pd.DataFrame({'id':df.id.repeat(s.str.len())}).reset_index(drop=True),pd.DataFrame(np.concatenate(s.values))],axis=1)
Out[118]:
id 0 1 2
0 abc-123 cat 100 xyz-123
1 abc-123 cat 96 uvw-456
2 def-456 dog 98 rst-789
3 def-456 dog 97 opq-123
4 def-456 dog 95 lmn-123
5 ghi-789 tree 100 ijk-123
Upvotes: 2