How to create a list of unique ID from a column in pandas where lists of ID are mentioned as strings in Python

Question

I have a pandas dataframe df

import pandas as pd

lst = [23682, 21963, 9711, 21175, 13022,1662,7399, 13679, 17654,4567,23608,2828, 1234]

lst_match = ['[21963]','[21175]', '[1662 7399 13679 ]','[17654 23608]','[2828]','0','0','0','0','0','0', '0','0' ]

df = pd.DataFrame(list(zip(lst, lst_match)),columns=['ID','ID_match'])

df

       ID            ID_match
0   23682             [21963]
1   21963             [21175]
2    9711   [1662 7399 13679]
3   21175       [17654 23608]
4   13022              [2828]
5    1662                   0
6    7399                   0
7   13679                   0
8   17654                   0
9    4567                   0
10  23608                   0
11   2828                   0
12   1234                   0

The values in ID_match column are also IDs though in a list in string format.

I want to create a dataframe of unique IDs in such a manner that my unique ID frame should contain all the ID which have some value other than 0 in ID_match column and those IDs' which are mentioned in the ID_match column.

so my output dataframe of unique ID's must look like:

How can I do this with python pandas?

jezrael · Accepted Answer

Use:

s = (df[df['ID_match'] != '0']
       .set_index('ID')['ID_match']
       .str.strip('[ ]')
       .str.split('\s+', expand=True)
       .stack())
print (s)
23682  0    21963
21963  0    21175
9711   0     1662
       1     7399
       2    13679
21175  0    17654
       1    23608
13022  0     2828
dtype: object


vals = s.index.get_level_values(0).to_series().append(s.astype(int)).unique()
df = pd.DataFrame({'ID':vals})
print (df)
       ID
0   23682
1   21963
2    9711
3   21175
4   13022
5    1662
6    7399
7   13679
8   17654
9   23608
10   2828

Explanation:

First filter out all non 0 value by boolean indexing
Create index by ID column by set_index
Remove trailing [ ] with strip
split value and reshape by stack
Then get first level of MultiIndex by get_level_values and convert to_series
append Series s converted to integers
Get unique values and last call DataFrame contructor

How to create a list of unique ID from a column in pandas where lists of ID are mentioned as strings in Python

Answers (2)

Related Questions