Reputation: 158
I have a black list that contains banned substrings: I need to make an if statement that checks if ANY of the banned substrings are contained in given url. If it doesn't contain any of them, I want it to do A (and do it only once if any banned is present, not for each banned substring). If url contains one of the banned substrings I want it to do B.
black_list = ['linkedin.com', 'yellowpages.com', 'facebook.com', 'bizapedia.com', 'manta.com',
'yelp.com', 'nextdoor.com', 'industrynet.com', 'twitter.com', 'zoominfo.com',
'google.com', 'yellow-listings.com', 'kompass.com', 'dnb.com', 'tripadvisor.com']
here are just two simple examples of urls that I'm using to check if it works. Url1 have banned substring inside, while url2 doesn't.
url1 = 'https://www.dnb.com/'
url2 = 'https://www.ok/'
I tried the code below that works but was wandering if there is better way (more computationally efficient) of doing it? I have a data frame of 100k+ urls so worried that this will be super slow.
mask = []
for banned in black_list:
if banned in url:
mask.append(True)
else:
mask.append(False)
if any(mask):
print("there is a banned substring inside")
else:
print("no banned substrings inside")
Does anybody knows more efficient way of doing this?
Upvotes: 1
Views: 7273
Reputation: 15364
Here is a possible one-line solution:
print('there is a banned substring inside'
if any(banned_str in url for banned_str in black_list)
else 'no banned substrings inside')
If you prefer a less pythonic approach:
if any(banned_str in url for banned_str in black_list):
print('there is a banned substring inside')
else:
print('no banned substrings inside')
Upvotes: 2
Reputation: 791
You should add a flag depending on which perform either A
or B
.
ban_flag = False
for banned in black_list:
if banned not in url:
continue
else:
ban_flag = True
if ban_flag:
print("there is a banned substring inside")
else:
print("no banned substrings inside")
Upvotes: 0