lost_in_pandas
lost_in_pandas

Reputation: 33

Multi-column to single column in Pandas

I have the following data frame :

    parent          0        1      2   3
0   14026529    14062504     0      0   0
1   14103793    14036094     0      0   0
2   14025454    14036094     0      0   0
3   14030252    14030253  14062647  0   0
4   14034704    14086964     0      0   0

And I need this :

    parent_id   child_id
 0   14026529   14062504
 1   14025454   14036094
 2   14030252   14030253  
 3   14030252   14062647
 4   14103793   14036094
 5   14034704   14086964

This is just a basic example, the real deal can have over 60 children.

Upvotes: 3

Views: 105

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

Use DataFrame.where, stack and reset_index.
Casting as Int64 first will prevent child_Id's being cast to floats during the stacking process.

(df.astype('Int64').where(df.ne(0))
 .set_index('parent')
 .stack()
 .reset_index(level=0, name='child'))

[out]

     parent     child
0  14026529  14062504
0  14103793  14036094
0  14025454  14036094
0  14030252  14030253
1  14030252  14062647
0  14034704  14086964

Upvotes: 2

Related Questions