guru
guru

Reputation: 173

Convert column with list of tuples to many columns

I have a dataframe where a column contains uneven lists of tuples. The tuples will all be of same length, only the lists are uneven. I want to melt this column in-frame so that new columns will be appended to the existing one and rows will be duplicated. Like so:

df
   name     id       list_of_tuples
0  john doe    abc-123  [('cat',100,'xyz-123'),('cat',96,'uvw-456')]
1  bob smith    def-456  [('dog',98,'rst-789'),('dog',97,'opq-123'),('dog',95,'lmn-123')]
2  bob parr    ghi-789  [('tree',100,'ijk-123')]

df_new
   name            id       val_1 val_2 val_3
0  john doe        abc-123  cat   100   xyz-123
1  john doe        abc-123  cat   96    uvw-456
2  bob smith       def-456  dog   98    rst-789
3  bob smith       def-456  dog   97    opq-123
4  violet parr     def-456  dog   95    lmn-123
5  violet parr     ghi-789  tree  100   ijk-123

For my current method I'm creating a new dataframe where I'm using the chain feature from itertools, but I want to get away from creating a whole other dataframe and merging it back on the 'id' column.

Here's my current code:

df_new = pd.DataFrame(list(chain.from_iterable(df.matches)),columns=['val_1','val_2','val_3']).reset_index(drop=True)
df_new['id'] = np.repeat(df.id.values, df['list_of_tuples'].str.len()) 

Upvotes: 1

Views: 936

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Update using pandas explode in 0.25.3+

dfi = df.explode('list_of_tuples')
df1 = pd.DataFrame(dfi['list_of_tuples'].to_list(), 
                   index=[dfi['name'], dfi['id']])\
        .add_prefix('val_')
df1.reset_index()

Output:

        name       id val_0  val_1    val_2
0   john doe  abc=123   cat    100  xyz-123
1   john doe  abc=123   cat     96  uvw-456
2  bob smith  def-456   dog     98  rst-789
3  bob smith  def-456   dog     97  opq-123
4  bob smith  def-456   dog     95  lmn-123
5   bob parr  ghi-780  tree    100  ijk-123

Let use apply with pd.Series:

df.set_index('id').list_of_tuples  #Set id as index and select list_of_tuples column
  .apply(pd.Series)                #apply pd.series to separate elements of list 
  .stack()                         #stack the elements vertically
  .apply(pd.Series)                #apply pd.Series to separate elements of tuples
  .add_prefix('val_')              #add prefix of val_ to all columns
  .reset_index()                   #Reset index to move id back into frame as column
  .drop('level_1', axis=1)         #Drop not need level_1 column from stack

Output:

        id val_0  val_1    val_2
0  abc-123   cat    100  xyz-123
1  abc-123   cat     96  uvw-456
2  def-456   dog     98  rst-789
3  def-456   dog     97  opq-123
4  def-456   dog     95  lmn-123
5  ghi-789  tree    100  ijk-123

Edited to handle question edit of adding 'name' to dataframe:

df.set_index(['name','id']).list_of_tuples
  .apply(pd.Series)
  .stack()
  .apply(pd.Series)
  .add_prefix('val_')
  .reset_index(level=-1,drop=True)
  .reset_index()

Output:

        name       id val_0  val_1    val_2
0   John Doe  abc-123   cat    100  xyz-123
1   John Doe  abc-123   cat     96  uvw-456
2  Bob Smith  def-456   dog     98  rst-789
3  Bob Smith  def-456   dog     97  opq-123
4  Bob Smith  def-456   dog     95  lmn-123
5   Bob Parr  ghi-789  tree    100  ijk-123

Upvotes: 2

BENY
BENY

Reputation: 323226

unnest your list then we do concat

s=df.list_of_tuples
pd.concat([pd.DataFrame({'id':df.id.repeat(s.str.len())}).reset_index(drop=True),pd.DataFrame(np.concatenate(s.values))],axis=1)
Out[118]: 
        id     0    1        2
0  abc-123   cat  100  xyz-123
1  abc-123   cat   96  uvw-456
2  def-456   dog   98  rst-789
3  def-456   dog   97  opq-123
4  def-456   dog   95  lmn-123
5  ghi-789  tree  100  ijk-123

Upvotes: 2

Related Questions