Reputation: 193
I have a question about splitting columns into multiple rows at Pandas with condition.
For example, I have this dataframe which contains PersonID and some codes.
PersonID Code1 Code2 Code3 Code4 Code5 Code6 Code7 Code8 Code9 Code10
1 a12163 q934581 t7198 q3213 q21357 h026 q3213 q934581 h026 a12163
2 a12453 f54548 k654798 z98798 l1957 g498 z98798 f54548 g498 a12453
3 a19538 g193545 q98798 n2132 s6555
4 b98787 q8575 l87987 k576 l4555 j5757 k576
5 c424533 h734535 m5798 u9513 e8203 o99995 u9513 h734535 o99995
I want to split columns into rows after every 5 codes. Like this,
PersonID Code1 Code2 Code3 Code4 Code5
1 a12163 q934581 t7198 q3213 q21357
1 h026 q3213 q934581 h026 a12163
2 a12453 f54548 k654798 z98798 l1957
2 g498 z98798 f54548 g498 a12453
3 a19538 g193545 q98798 n2132 s6555
4 b98787 q8575 l87987 k576 l4555
4 j5757 k576 NaN NaN NaN
5 c424533 h734535 m5798 u9513 e8203
5 o99995 u9513 h734535 o99995 NaN
How can I do this?
Thank you for your help.
Upvotes: 2
Views: 108
Reputation: 148880
A possible way is to hide PersonID in the index, split the dataframe horizontaly and concat it back verticaly after removing empty rows in the second part and renaming its columns. In Pandas syntax it could be:
tmp = df.set_index('PersonID')
tmp1 = tmp[tmp.columns[:5].to_list()]
tmp2 = tmp[tmp.columns[5:].to_list()]
tmp2 = tmp2[tmp2.count(axis=1) > 0]
tmp2.columns = tmp1.columns
result = pd.concat([tmp1, tmp2]).sort_index().reset_index()
which gives as expected:
PersonID Code1 Code2 Code3 Code4 Code5
0 1 a12163 q934581 t7198 q3213 q21357
1 1 h026 q3213 q934581 h026 a12163
2 2 a12453 f54548 k654798 z98798 l1957
3 2 g498 z98798 f54548 g498 a12453
4 3 a19538 g193545 q98798 n2132 s6555
5 4 b98787 q8575 l87987 k576 l4555
6 4 j5757 k576 NaN NaN NaN
7 5 c424533 h734535 m5798 u9513 e8203
8 5 o99995 u9513 h734535 o99995 NaN
If you need a generalized version where the number of columns could be different, you would just have to add a loop:
#split after every fith column, whatever the total number of columns
split_at = 5
tmp = df.set_index('PersonID')
tmps = [tmp[tmp.columns[i:i+ split_at].to_list()]
for i in range(0, len(tmp.columns), split_at)]
for i, tmp in enumerate(tmps[1:], 1):
tmp.columns = tmps[0].columns
tmps[i] = tmp[tmp.count(axis=1) > 0]
result = pd.concat(tmps).sort_index().reset_index()
Upvotes: 3
Reputation: 494
Slicing Dataframe the concat them
df1 = df.iloc[:, 0:6]
df2 = df.iloc[:, 6:11]
df2['PersonID'] = df['PersonID']
df2=df2.rename(columns={'Code6': 'Code1', 'Code7': 'Code2', 'Code8': 'Code3', 'Code9': 'Code4', 'Code10': 'Code5'})
pd.concat([df1, df2]).sort_index()
Upvotes: 1