Reputation: 62
I have a following dataframe:
driver_id status dttm
9f8f9bf3ee8f4874873288c246bd2d05 free 2018-02-04 00:19
9f8f9bf3ee8f4874873288c246bd2d05 busy 2018-02-04 01:03
8f174ffd446c456eaf3cca0915d0368d free 2018-02-03 15:43
8f174ffd446c456eaf3cca0915d0368d enroute 2018-02-03 17:02
3 columns : driver_id, status, dttm
What I need to do is to group by driver id and make list of all statuses with their respective dttm values into new column called 'driver_info'
:
driver_id driver_info
9f8f9bf3ee8f4874873288c246bd2d05 [("free", 2018-02-04 00:19), ("busy", 2018-02-04 01:03)]
8f174ffd446c456eaf3cca0915d0368d [("free", 2018-02-03 15:43), ("enroute", 2018-02-03 17:02) ...]
How do I do that in python 3?
I tried
dfg = df.groupby("driver_id").apply(lambda x: pd.concat((x["status"], x["dttm"])))
but the result differs from what I expect it to be...
Upvotes: 3
Views: 125
Reputation: 862641
Use GroupBy.apply
with list
and zip
for list of tuples:
df1 = (df.groupby('driver_id')
.apply(lambda x: list(zip(x['status'], x['dttm'])))
.reset_index(name='driver_info'))
print (df1)
driver_id \
0 8f174ffd446c456eaf3cca0915d0368d
1 9f8f9bf3ee8f4874873288c246bd2d05
driver_info
0 [(free, 2018-02-03 15:43), (enroute, 2018-02-0...
1 [(free, 2018-02-04 00:19), (busy, 2018-02-04 0...
Upvotes: 2
Reputation: 17007
try: using zip and apply(list)
df['driver_info'] = list(zip(df['status'], df['dttm']))
df = df.groupby('driver_id')['driver_info'].apply(list)
Upvotes: 2