Reputation: 231
There are the following data:
board_href_deals items test1
0 test2 {'x': 'a'} test1
1 test2 {'x': 'b'} test2
After grouping "board_href_deals", I would like to output the existing data in a list format as follows:
board_href_deals items test1
0 test2 [{'x': 'a'}, {'x': 'b'}] ['test1', 'test2']
thank you
Upvotes: 2
Views: 69
Reputation: 164773
An alternative solution, especially on older versions of Pandas, is to use GroupBy
+ apply
on a sequence, then combine via concat
.
Benchmarking on Python 3.60 / Pandas 0.19.2. This contrived example has a small number of groups; you should test with your data if efficiency is a concern.
import pandas as pd
df = pd.DataFrame({'A': ['test2', 'test2', 'test4', 'test4'],
'B': [{'x': 'a'}, {'x': 'b'}, {'y': 'a'}, {'y': 'b'}],
'C': ['test1', 'test2', 'test3', 'test4']})
df = pd.concat([df]*10000)
def jpp(df):
g = df.groupby('A')
L = [g[col].apply(list) for col in ['B', 'C']]
return pd.concat(L, axis=1).reset_index()
%timeit jpp(df) # 11.3 ms per loop
%timeit df.groupby('A').agg(lambda x: list(x)) # 20.5 ms per loop
Upvotes: 1
Reputation: 863196
Use DataFrameGroupBy.agg
, tested in pandas 0.23.4
:
df = df.groupby('board_href_deals', as_index=False).agg(list)
print (df)
board_href_deals items test1
0 test2 [{'x': 'a'}, {'x': 'b'}] [test1, test2]
Thank you @jpp for solution for oldier pandas:
df = df.groupby('board_href_deals').agg(lambda x: list(x))
Upvotes: 2