Reputation: 2620
I have a DataFrame likes below:
IDS Metric
1,2 100
1,3 200
3 300
...
I want to find any two IDs exist in the same row, for example, both "1,2" and "1,3" exist in one row, but "2,3" has no direct relationship (means no competition between them in business)
I want to have a function to judge for any two IDs common existing and return True/False.
Just for "judge for any two IDs common existing", I think the following could work:
target_list = ['1', '2']
df["IDS"].apply(lambda ids: all(id in ids for id in target_list)).any()
# return True
target_list = ['2', '3']
df["IDS"].apply(lambda ids: all(id in ids for id in target_list)).any()
# return False
However, as lambda function will iterate each row in df, and it may be inefficient to iterate all rows, because I only need to judge whether exist. I hope it should return when first common existing happens.
Could anyone help me about that? Thanks a lot
Upvotes: 0
Views: 45
Reputation: 862521
Use:
df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1).any()
Another idea with sets:
target_list = ['1', '2']
s = set(target_list)
a = any(s.issubset(x.split(',')) for x in df["IDS"])
print (a)
True
Details:
print (df["IDS"].str.split(',', expand=True))
0 1
0 1 2
1 1 3
2 3 None
print (df["IDS"].str.split(',', expand=True).isin(target_list))
0 1
0 True True
1 True False
2 False False
print (df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1))
0 True
1 False
2 False
dtype: bool
print (df["IDS"].str.split(',', expand=True).isin(target_list).all(axis=1).any())
True
Upvotes: 1