Reputation: 10383
I have a DataFrame with an Ids column an several columns with data, like the column "value" in this example.
For this DataFrame I want to move all the values that correspond to the same id to a new column in the row as shown below:
I guess there is an opposite function to "melt" that allow this, but I'm not getting how to pivot this DF.
The dicts for the input and out DFs are:
d = {"id":[1,1,1,2,2,3,3,4,5],"value":[12,13,1,22,21,23,53,64,9]}
d2 = {"id":[1,2,3,4,5],"value1":[12,22,23,64,9],"value2":[1,21,53,"","",],"value3":[1,"","","",""]}
Upvotes: 4
Views: 3173
Reputation: 164623
You can GroupBy
to a list, then expand the series of lists:
df = pd.DataFrame(d) # create input dataframe
res = df.groupby('id')['value'].apply(list).reset_index() # groupby to list
res = res.join(pd.DataFrame(res.pop('value').values.tolist())) # expand lists to columns
print(res)
id 0 1 2
0 1 12 13.0 1.0
1 2 22 21.0 NaN
2 3 23 53.0 NaN
3 4 64 NaN NaN
4 5 9 NaN NaN
In general, such operations will be expensive as the number of columns is arbitrary. Pandas / NumPy solutions work best when you can pre-allocate memory, which isn't possible here.
Upvotes: 2
Reputation: 862511
Create MultiIndex
by cumcount
, reshape by unstack
and add change columns names by add_prefix
:
df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
.unstack()
.add_prefix('value')
.reset_index())
print (df)
id value0 value1 value2
0 1 12.0 13.0 1.0
1 2 22.0 21.0 NaN
2 3 23.0 53.0 NaN
3 4 64.0 NaN NaN
4 5 9.0 NaN NaN
Missing values is possible replace by fillna
, but get mixed numeric with strings data, so some function should failed:
df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
.unstack()
.add_prefix('value')
.reset_index()
.fillna(''))
print (df)
id value0 value1 value2
0 1 12.0 13 1
1 2 22.0 21
2 3 23.0 53
3 4 64.0
4 5 9.0
Upvotes: 4