Gilgalad930
Gilgalad930

Reputation: 1

Python - Find words in html with if statement

I have no experience in programming, only little in SQL. I am learning python and trying to web scrape, but need some guidance. Thank you guys in advance!

I am trying to build a code where it scrapes the website and finds certain words like "2018 estimated distribution" but in whatever order. Once the web scrape finds it, it lets me know if the condition is True or False.

Here is my following code:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.aberdeen-asset.us/en/usretail/fund-center/tax-information'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup (page_html, "html.parser")
page_soup.h1
page_soup.p
page_soup.body.span
containers = page_soup.findAll("h3")
len(containers)
Hopeful = containers

def sanity_check(Hopeful):
    if '2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information' in Hopeful:
    return True
else:
    return False

maybe = sanity_check('2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information')
print(maybe)

In that website, it does not have "2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information" but it is returning True, but I am expecting it to return False.

Am I missing something in the if statement?

Thank you

Upvotes: 0

Views: 446

Answers (1)

dgkr
dgkr

Reputation: 345

Your mistake is at

maybe = sanity_check('2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information')

Edit this to:

 maybe = sanity_check(Hopeful)

Your mistake is passing the string '2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information' as an argument to the function sanity_check. This results in calculation of is '2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information' in Hopeful where Hopeful is the string '2018 Aberdeen Funds and Aberdeen Investment Funds Capital Gains Distributions Information' This returns true as Hopeful and the string you are asking if Hopeful includes are the same strings.

I think you meant to pass the Hopeful variable you declared earlier to your sanity_check function.

Upvotes: 2

Related Questions