Reputation: 15
Want to get herf value of "website" and "email"
YP_Details.txt URL https://www.yellowpages.com/bakersfield-ca/mip/robson-eilers-jewelers-6717482
here is the code
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
with open('YP_Details.txt', 'r') as f:
for url in f:
print(url)
uClient = urlopen(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
out_filename = "YP_Details.csv"
containers = page_soup.findAll("header", {"id":"main-header"})
headers = "Business_Name,Address,Phone,Website,Email \n"
with open(out_filename, "w") as fout:
fout.write(headers)
for container in containers:
Business_Name = container.h1.text
Address = container.h2.text
Phone = container.p.text
#want to get the "href" value as output
Website_container = container.findAll("a", {"class": "website-link"})
Website = Website_container[0].text
#want to get the "href" value as output
Email_container = container.findAll("a", {"class": "email-business"})
Email = Email_container[0].text
print("Business_Name:" + Business_Name + "Address:" + Address + "Phone:" + Phone + "Website:" + Website + "Email:" + Email + "\n" )
fout.write(Business_Name + "," + Address.replace(",", "|") + ", " + Phone + ", " + Website + ", " + Email + "\n")
Upvotes: 1
Views: 46
Reputation: 2328
You have to reference the href
attribute of the element. The href
is not the text
. .text
refers to the content within the elements opening and closing tag.
#want to get the "href" value as output
Website_container = container.findAll("a", {"class": "website-link"})
Website = Website_container[0]['href']
print(Website)
#want to get the "href" value as output
Email_container = container.findAll("a", {"class": "email-business"})
Email = Email_container[0]['href']
print(Email)
Upvotes: 1