Prajit Karande
Prajit Karande

Reputation: 131

how to get unique values from list column by group by user_id in pandas

input:-

    print(df)
device_id           ids
025c08d535a074b4    [8972]
025c08d535a074b4    [10595, 10595]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 8791]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 10052, 8345]

should output be unique list of ids for each device_d like :

device_id           ids
025c08d535a074b4    [8972,10595]
02612734f96edc43    [10016, 8795, 10019,8791,8351,10052, 8345]

I try this by using :-->

    df=pd.DataFrame(df.groupby('device_id')['ids'].apply(set))

but it not work properly it add ' for before ids and return list like.

device_id           ids
025c08d535a074b4    [8972,'10595, 10595]
02612734f96edc43    ['10016,8795,10019,8791,8351,8791,'10016]

Upvotes: 4

Views: 1325

Answers (2)

Chris Adams
Chris Adams

Reputation: 18647

Use numpy.hstack and numpy.unique:

import numpy as np

df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x)))

or if maintaining order is important, use pandas.Series constructor with drop_duplicates:

df.groupby('device_id')['ids'].apply(lambda x: pd.Series(np.hstack(x)).drop_duplicates().to_list())

[out]

device_id
025c08d535a074b4                                    [8972, 10595]
02612734f96edc43    [10016, 8795, 10019, 8791, 8351, 10052, 8345]

If you need output as a DataFrame, just chain on .reset_index:

df.groupby('device_id')['ids'].apply(lambda x: np.unique(np.hstack(x))).reset_index()

[out]

          device_id                                            ids
0  025c08d535a074b4                                  [8972, 10595]
1  02612734f96edc43  [8345, 8351, 8791, 8795, 10016, 10019, 10052]

Upvotes: 4

U13-Forward
U13-Forward

Reputation: 71580

Try using:

>>> grouped = df.groupby('device_id', as_index=False).sum()
>>> grouped['ids'] = grouped['ids'].apply(lambda x: sorted(set(x), key=x.index))
>>> grouped
          device_id                                            ids
0  025c08d535a074b4                                  [8972, 10595]
1  02612734f96edc43  [10016, 8795, 10019, 8791, 8351, 10052, 8345]
>>> 

Upvotes: 2

Related Questions