Reputation: 515
I have a df that looks like this
ID data1 data2
index
1 1 3 4
2 1 2 5
3 2 9 3
4 3 7 2
5 3 4 7
6 1 10 12
What I'm trying to do is append as columns all the lines that have the same ID, so that I'd get something like this
ID data2 data3 data4 data5 data6 data7
index
1 1 3 4 2 5 10 12
3 2 9 3
4 3 7 2 4 7
The problem is that I don't know how many columns I will have to append.
The column. Note that ID is NOT an index but a normal column, but the one that is used to find the duplicates.
I have already tried with pd.concat()
, but had no luck.
Upvotes: 1
Views: 1361
Reputation: 862511
You can use cumcount
for count duplicates with set_index
+ unstack
for reshaping. Then convert MultiIndex
to columns
by map
and last reset_index
for column ID
from index
.
df['g'] = df.groupby('ID').cumcount().astype(str)
df = df.set_index(['ID','g']).unstack().sort_index(axis=1, level=1)
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
ID data1_0 data2_0 data1_1 data2_1 data1_2 data2_2
0 1 3.0 4.0 2.0 5.0 10.0 12.0
1 2 9.0 3.0 NaN NaN NaN NaN
2 3 7.0 2.0 4.0 7.0 NaN NaN
Solution with pivot
:
df['g'] = df.groupby('ID').cumcount().astype(str)
df = df.pivot(index='ID',columns='g').sort_index(axis=1, level=1)
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
ID data1_0 data2_0 data1_1 data2_1 data1_2 data2_2
0 1 3.0 4.0 2.0 5.0 10.0 12.0
1 2 9.0 3.0 NaN NaN NaN NaN
2 3 7.0 2.0 4.0 7.0 NaN NaN
Another solution with apply
and DataFrame
constructor:
df = df.groupby('ID')['data1','data2']
.apply(lambda x: pd.DataFrame(x.values, columns=['a','b']))
.unstack()
.sort_index(axis=1, level=1)
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
df = df.reset_index()
print (df)
ID a_0 b_0 a_1 b_1 a_2 b_2
0 1 3.0 4.0 2.0 5.0 10.0 12.0
1 2 9.0 3.0 NaN NaN NaN NaN
2 3 7.0 2.0 4.0 7.0 NaN NaN
Upvotes: 1