Move values in rows in a new column in pandas

I have a DataFrame with an Ids column an several columns with data, like the column "value" in this example.

enter image description here

For this DataFrame I want to move all the values that correspond to the same id to a new column in the row as shown below:

enter image description here

I guess there is an opposite function to "melt" that allow this, but I'm not getting how to pivot this DF.

The dicts for the input and out DFs are:

d = {"id":[1,1,1,2,2,3,3,4,5],"value":[12,13,1,22,21,23,53,64,9]}
d2 = {"id":[1,2,3,4,5],"value1":[12,22,23,64,9],"value2":[1,21,53,"","",],"value3":[1,"","","",""]}

Upvotes: 4

Views: 3173

Answers (2)

jpp
jpp

Reputation: 164623

You can GroupBy to a list, then expand the series of lists:

df = pd.DataFrame(d)                                            # create input dataframe
res = df.groupby('id')['value'].apply(list).reset_index()       # groupby to list
res = res.join(pd.DataFrame(res.pop('value').values.tolist()))  # expand lists to columns

print(res)

   id   0     1    2
0   1  12  13.0  1.0
1   2  22  21.0  NaN
2   3  23  53.0  NaN
3   4  64   NaN  NaN
4   5   9   NaN  NaN

In general, such operations will be expensive as the number of columns is arbitrary. Pandas / NumPy solutions work best when you can pre-allocate memory, which isn't possible here.

Upvotes: 2

jezrael
jezrael

Reputation: 862511

Create MultiIndex by cumcount, reshape by unstack and add change columns names by add_prefix:

df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
        .unstack()
        .add_prefix('value')
        .reset_index())
print (df)

   id  value0  value1  value2
0   1    12.0    13.0     1.0
1   2    22.0    21.0     NaN
2   3    23.0    53.0     NaN
3   4    64.0     NaN     NaN
4   5     9.0     NaN     NaN

Missing values is possible replace by fillna, but get mixed numeric with strings data, so some function should failed:

df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
        .unstack()
        .add_prefix('value')
        .reset_index()
        .fillna(''))
print (df)

   id  value0 value1 value2
0   1    12.0     13      1
1   2    22.0     21       
2   3    23.0     53       
3   4    64.0              
4   5     9.0          

Upvotes: 4

Related Questions