linpingta
linpingta

Reputation: 2620

How to find any common existing in Pandas Column

I have a DataFrame likes below:

IDS Metric
1,2  100
1,3  200
3    300
...

I want to find any two IDs exist in the same row, for example, both "1,2" and "1,3" exist in one row, but "2,3" has no direct relationship (means no competition between them in business)

I want to have a function to judge for any two IDs common existing and return True/False.

Just for "judge for any two IDs common existing", I think the following could work:

 target_list = ['1', '2']
 df["IDS"].apply(lambda ids: all(id in ids for id in target_list)).any()
 # return True

 target_list = ['2', '3']
 df["IDS"].apply(lambda ids: all(id in ids for id in target_list)).any()
 # return False

However, as lambda function will iterate each row in df, and it may be inefficient to iterate all rows, because I only need to judge whether exist. I hope it should return when first common existing happens.

Could anyone help me about that? Thanks a lot

Upvotes: 0

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 862521

Use:

df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1).any()

Another idea with sets:

target_list = ['1', '2']
s = set(target_list)

a = any(s.issubset(x.split(',')) for x in df["IDS"])
print (a)
True

Details:

print (df["IDS"].str.split(',', expand=True))
   0     1
0  1     2
1  1     3
2  3  None

print (df["IDS"].str.split(',', expand=True).isin(target_list))
       0      1
0   True   True
1   True  False
2  False  False

print (df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1))
0     True
1    False
2    False
dtype: bool

print (df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1).any())
True

Upvotes: 1

Related Questions