Reputation: 131
input:-
print(df)
device_id ids
025c08d535a074b4 [8972]
025c08d535a074b4 [10595, 10595]
02612734f96edc43 [10016, 8795, 10019, 8791, 8351, 8791]
02612734f96edc43 [10016, 8795, 10019, 8791, 8351, 10052, 8345]
should output be unique list of ids for each device_d like :
device_id ids
025c08d535a074b4 [8972,10595]
02612734f96edc43 [10016, 8795, 10019,8791,8351,10052, 8345]
I try this by using :-->
df=pd.DataFrame(df.groupby('device_id')['ids'].apply(set))
but it not work properly it add '
for before ids and return list like.
device_id ids
025c08d535a074b4 [8972,'10595, 10595]
02612734f96edc43 ['10016,8795,10019,8791,8351,8791,'10016]
Upvotes: 4
Views: 1325
Reputation: 18647
Use numpy.hstack
and numpy.unique
:
import numpy as np
df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x)))
or if maintaining order is important, use pandas.Series
constructor with drop_duplicates
:
df.groupby('device_id')['ids'].apply(lambda x: pd.Series(np.hstack(x)).drop_duplicates().to_list())
[out]
device_id
025c08d535a074b4 [8972, 10595]
02612734f96edc43 [10016, 8795, 10019, 8791, 8351, 10052, 8345]
If you need output as a DataFrame
, just chain on .reset_index
:
df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x))).reset_index()
[out]
device_id ids
0 025c08d535a074b4 [8972, 10595]
1 02612734f96edc43 [8345, 8351, 8791, 8795, 10016, 10019, 10052]
Upvotes: 4
Reputation: 71580
Try using:
>>> grouped = df.groupby('device_id', as_index=False).sum()
>>> grouped['ids'] = grouped['ids'].apply(lambda x: sorted(set(x), key=x.index))
>>> grouped
device_id ids
0 025c08d535a074b4 [8972, 10595]
1 02612734f96edc43 [10016, 8795, 10019, 8791, 8351, 10052, 8345]
>>>
Upvotes: 2