Reputation: 87
Given the pandas DataFrame:
name hobby since
paul A 1995
john A 2005
paul B 2015
mary G 2013
chris E 2005
chris D 2001
paul C 1986
I would like to get:
name hobby1 since1 hobby2 since2 hobby3 since3
paul A 1995 B 2015 C 1986
john A 2005 NaN NaN NaN NaN
mary G 2013 NaN NaN NaN NaN
chris E 2005 D 2001 NaN NaN
I.e. I would like to have one row per name. The maximum number of hobbies a person can have, say 3 in this case, is something I know in advance. What would be the most elegant/short way to do this?
Upvotes: 3
Views: 72
Reputation: 25239
Use cumcount
and unstack
. Finally, use multiindex.map
to join 2-level columns to one level
df1 = df.set_index(['name', df.groupby('name').cumcount().add(1)]) \
.unstack().sort_index(1,level=1)
df1.columns = df1.columns.map('{0[0]}{0[1]}'.format)
Out[812]:
hobby1 since1 hobby2 since2 hobby3 since3
name
chris E 2005.0 D 2001.0 NaN NaN
john A 2005.0 NaN NaN NaN NaN
mary G 2013.0 NaN NaN NaN NaN
paul A 1995.0 B 2015.0 C 1986.0
Upvotes: 2
Reputation: 1514
Maybe something like this? But you would need to rename the columns after with this solution.
df["combined"] = [ "{}_{}".format(x,y) for x,y in zip(df.hobby,df.since)]
df.groupby("name")["combined"]
.agg(lambda x: "_".join(x))
.str.split("_",expand=True)
The result is:
0 1 2 3 4 5
name
chris E 2005 D 2001 None None
john A 2005 None None None None
mary G 2013 None None None None
paul A 1995 B 2015 C 1986
Upvotes: 1
Reputation: 75080
You can first melt
and then , groupby.cumcount()
to add to the variable
and then pivot using pivot_table()
:
m=df.melt('name')
(m.assign(variable=m.variable+(m.groupby(['name','variable']).cumcount()+1).astype(str))
.pivot_table(index='name',columns='variable',values='value',aggfunc='first')
.rename_axis(None,axis=1))
hobby1 hobby2 hobby3 since1 since2 since3
name
chris E D NaN 2005 2001 NaN
john A NaN NaN 2005 NaN NaN
mary G NaN NaN 2013 NaN NaN
paul A B C 1995 2015 1986
Upvotes: 3