Generate new feature based on duplicate

Question

I have a pandas.DataFrame that of events, that looks like this:

event_name  event_person_firstname  event_person_lastname   price   year
event_a     foo                     bar                     100     2017
event_a     foo                     bar                     42      2016
event_a     foo                     bar                     90      2016
event_a     not                     same                    80      2015
event_b     random                  name                    200     2018
event_b     random                  name                    42      2010

I want to merge the rows based on event_name and event_person_firstname + event_person_lastname so, that there is a new column named after for each year with the corresponding price (If there is no Price in the data, it should be NaN or empty - I don't care).

It's hard to explain so here is the expected result after the transformation of the the DataFrame above:

event_name  event_person_firstname  event_person_lastname   2018    2017    2016    2015    2014    2010
event_a     foo                     bar                     NaN     100     42      NaN     90      NaN
event_a     not                     same                    NaN     NaN     NaN     80      NaN     NaN
event_b     random                  name                    200     NaN     NaN     NaN     NaN     42

I tried grouping those duplicate elements first, but am stuck on how to proceed after that step. I hope you can help. Thanks in Advance!

Mohamed AL ANI · Accepted Answer

You can use pandas.pivot_table:

df.pivot_table(index=["event_name", "event_person_firstname", "event_person_lastname"], columns="year", values="price").reset_index()

Generate new feature based on duplicate

Answers (1)

Related Questions