Reputation: 1393
I currently have a dataframe (df) like this:
name info
alpha foo,bar
alpha bar,foo
beta foo,bar
beta bar,foo
beta baz,qux
I'm looking to create a dataframe like this:
name info
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
I'm getting close with groupby.apply(list). Eg.
new_df=df.groupby('name')['info'].apply(list)
However, I can't seem to figure out how to get the output in the original dataframe format. (i.e with two columns (like the example)
I think I need reset_index
and unstack
? Appreciate any help!
Upvotes: 3
Views: 1676
Reputation: 24593
Try following using for
loop:
uniqnames = df.name.unique() # get unique names
newdata = [] # data list for output dataframe
for u in uniqnames: # for each unique name
subdf = df[df.name == u] # get rows with this unique name
s = ""
for i in subdf['info']:
s += "("+i+")," # join all info cells for that name
newdata.append([u, s[:-1]]) # remove trailing comma from infos & add row to data list
newdf = pd.DataFrame(data=newdata, columns=['name','info'])
print(newdf)
Output is exactly as desired:
name info
0 alpha (foo,bar),(bar,foo)
1 beta (foo,bar),(bar,foo),(baz,qux)
Upvotes: 2
Reputation: 323316
IIUC
df.assign(info='('+df['info']+')').groupby('name')['info'].apply(','.join).to_frame('info')
Out[267]:
info
name
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
#df.assign(info='('+df['info']+')')# adding the ( and ) for your single string to match with the out put
#groupby('name')# group by the name, you need merge info under the same name
#apply(','.join).to_frame('info') # this will combine each info into one string under the same group
Upvotes: 1
Reputation: 153500
IIUC:
df = pd.DataFrame({'name':['alpha']*2+['beta']*3,
'info':[['foo','bar'],['bar','foo'],
['foo','bar'],['bar','foo'],
['baz','qux']]})
print(df)
Inuput:
info name
0 [foo, bar] alpha
1 [bar, foo] alpha
2 [foo, bar] beta
3 [bar, foo] beta
4 [baz, qux] beta
Now, groupby and apply then reset_index() to get back to dataframe:
new_df = df.groupby('name')['info'].apply(list)
new_df = new_df.reset_index()
print(new_df)
Output:
name info
0 alpha [[foo, bar], [bar, foo]]
1 beta [[foo, bar], [bar, foo], [baz, qux]]
Upvotes: 0