Reputation: 91
I have a dataframe that looks like this:
Value group_id
23 1
28 1
32 1
....
35 12
23 12
42 12
There are 6 unique group_id in total, each with few hundred elements. I want to convert this into a dataframe that has the value for a single 'group_id' in one single row. So I want my dataframe to look like this:
value_1 value_2 value_3 group_id
23 28 32 1
....
35 23 42 12
I tried using pandas.melt and pandas.groupby but failed to get any results.
Upvotes: 2
Views: 73
Reputation: 10359
Use groupby
with cumcount
to fill in your 'value_'
records, and then use .pivot_table
on the result:
df['val_grp'] = 'value_' + (df.groupby('group_id').cumcount() + 1).astype(str)
pvt = df.pivot_table(index='group_id', columns='val_grp', values='Value')
print(pvt)
Prints:
val_grp value_1 value_2 value_3
group_id
1 23 28 32
12 35 23 42
As mentioned in the comments by andrew_reece, this can be done in a single line by doing:
pvt = df.assign(idx=(df.groupby("group_id").cumcount()+1).astype(str).str.replace("^(.)", "value_\\1", regex=True)).pivot(index="group_id", columns="idx", values="Value")
However, it should be noted that this requires Pandas 23.0 or higher, as this is when the regex
parameter for str.replace
was introduced.
Upvotes: 2
Reputation: 294338
You are missing the the position of the 'Value'
within each group. We can create it with groupby.cumcount
df.set_index(
['group_id', df.groupby('group_id').cumcount() + 1]
).Value.unstack().add_prefix('Value_').reset_index()
group_id Value_1 Value_2 Value_3
0 1 23 28 32
1 12 35 23 42
Upvotes: 3