Reputation: 2063
I have a pandas dataframe of the form
advert_id run_id category score
11111111 78 842 0.356975
11111111 78 849 0.245583
11111111 78 950 0.089219
11111111 78 1645 0.089172
11111111 78 2494 0.044254
... ... ... ...
22222222 1083267 2521 0.078275
22222222 1083267 2553 0.121556
22222222 1083267 2872 0.039226
22222222 1083267 3045 0.362127
22222222 1083267 3049 0.040135
And would like to transform it to a dataframe of the form (one row now per advert_id):
advert_id run_id category_1 score_1 category_2 score_2 category_3 score_3 ... category_n score_n
11111111 78 842 0.356975 849 0.245583 950 0.089219 ...
22222222 1083267 2521 0.078275 2553 0.121556 2872 0.039226 ...
The number of category per advert can vary, some adverts may have 1..n categories.
Is there an elegant way to do this with python/pandas other than grouping the dataframe and "manually" iterating over the groups and populating a separate dataframe?
Upvotes: 2
Views: 255
Reputation: 1360
You could use pivot
to change the dataframe to the wide format. As a raw output, the pivot will leave you with a hierarchical index on the columns but you can flatten that to make the columns look like you wanted above.
# add a key as in solution above from BEN_YO
df['temp_key']=(df.groupby('advert_id').cumcount()+1)
# do the pivot
df = df.pivot(index='advert_id', columns="temp_key", values=["category", "score"])
# make the columns look as expected
df.columns = [' '.join(col).strip() for col in df.columns.values]
print(df.head())
The result is as desired:
category 1 category 2 category 3 ... score 4 score 5
advert_id ...
11111111 842.0 849.0 950.0 ... 0.089172 0.044254
22222222 2521.0 2553.0 2872.0 ... 0.362127 0.040135
Upvotes: 2
Reputation: 323386
After create the additional key with cumcount
df['key2']=(df.groupby('advert_id').cumcount()+1)
s=df.set_index(['advert_id','run_id','key2']).unstack().sort_index(level=1,axis=1)
s.columns=s.columns.map('{0[0]}_{0[1]}'.format)
s
Out[59]:
category_1 score_1 ... category_5 score_5
advert_id run_id ...
11111111 78 842 0.356975 ... 2494 0.044254
22222222 1083267 2521 0.078275 ... 3049 0.040135
[2 rows x 10 columns]
Upvotes: 6