Mathieu
Mathieu

Reputation: 761

Check if a string is in a link list with BeautifulSoup

I want to know if there is one word from a list in a string inside one or more text links.

I tried this:

keywords = ["word1", "word1", "word1", "word2", "word3"]

html_template = "word4 word2 word1 <a href='#'>the link one<a/> <a \
href='#'>the word1 is link 2<a/> word7 <a \
href='#'>word3 example<a/> word453"

soup=BeautifulSoup(html_template, 'html5lib')
links=soup.findAll('a')

for keyword in keywords:
    if keyword in links:
        status="ok"
        break

Expected results: If one keyword from keywords is found inside a link from html_template, then status is ok

Upvotes: 0

Views: 581

Answers (1)

Rustam Garayev
Rustam Garayev

Reputation: 2692

Firstly, you need to clean your links to get only text without whole tag:

links=soup.findAll('a')
clean_links = [link.text for link in links if link.text]
# -> ['the link one the word1 is link 2 word7 word3 example word453', ..]

When you use keyword in <list>, it checks if there is a list element which is exactly that keyword. But in your case, there are some extra words too. So, you need to check whether that keyword exists in list elements while looping through it, rather than list itself:

# defining status to prevent NameError in case keyword is not found
status = None
for keyword in keywords:
    for link in clean_links:
        if keyword in link:
            status = "ok"
            break
# status -> "ok"

Upvotes: 2

Related Questions