Split columns into multiple rows by condition

Question

I have a question about splitting columns into multiple rows at Pandas with condition.

For example, I have this dataframe which contains PersonID and some codes.

PersonID    Code1   Code2   Code3   Code4   Code5   Code6   Code7   Code8   Code9   Code10
        1   a12163  q934581 t7198   q3213   q21357  h026    q3213   q934581 h026    a12163
        2   a12453  f54548  k654798 z98798  l1957   g498    z98798  f54548  g498    a12453
        3   a19538  g193545 q98798  n2132   s6555                   
        4   b98787  q8575   l87987  k576    l4555   j5757   k576            
        5   c424533 h734535 m5798   u9513   e8203   o99995  u9513   h734535 o99995

I want to split columns into rows after every 5 codes. Like this,

PersonID    Code1   Code2   Code3   Code4   Code5
        1   a12163  q934581 t7198   q3213   q21357  
        1   h026    q3213   q934581 h026    a12163
        2   a12453  f54548  k654798 z98798  l1957   
        2   g498    z98798  f54548  g498    a12453
        3   a19538  g193545 q98798  n2132   s6555                   
        4   b98787  q8575   l87987  k576    l4555   
        4   j5757   k576    NaN     NaN     NaN
        5   c424533 h734535 m5798   u9513   e8203   
        5   o99995  u9513   h734535 o99995  NaN

How can I do this?

Thank you for your help.

Serge Ballesta · Accepted Answer

A possible way is to hide PersonID in the index, split the dataframe horizontaly and concat it back verticaly after removing empty rows in the second part and renaming its columns. In Pandas syntax it could be:

tmp = df.set_index('PersonID')
tmp1 = tmp[tmp.columns[:5].to_list()]
tmp2 = tmp[tmp.columns[5:].to_list()]
tmp2 = tmp2[tmp2.count(axis=1) > 0]
tmp2.columns = tmp1.columns

result = pd.concat([tmp1, tmp2]).sort_index().reset_index()

which gives as expected:

   PersonID    Code1    Code2    Code3   Code4   Code5
0         1   a12163  q934581    t7198   q3213  q21357
1         1     h026    q3213  q934581    h026  a12163
2         2   a12453   f54548  k654798  z98798   l1957
3         2     g498   z98798   f54548    g498  a12453
4         3   a19538  g193545   q98798   n2132   s6555
5         4   b98787    q8575   l87987    k576   l4555
6         4    j5757     k576      NaN     NaN     NaN
7         5  c424533  h734535    m5798   u9513   e8203
8         5   o99995    u9513  h734535  o99995     NaN

If you need a generalized version where the number of columns could be different, you would just have to add a loop:

#split after every fith column, whatever the total number of columns
split_at = 5

tmp = df.set_index('PersonID')
tmps = [tmp[tmp.columns[i:i+ split_at].to_list()]
        for i in range(0, len(tmp.columns), split_at)]
for i, tmp in enumerate(tmps[1:], 1):
    tmp.columns = tmps[0].columns
    tmps[i] = tmp[tmp.count(axis=1) > 0]

result = pd.concat(tmps).sort_index().reset_index()

Split columns into multiple rows by condition

Answers (2)

Related Questions