Shuvankar Das
Shuvankar Das

Reputation: 15

Get the "href" value

Want to get herf value of "website" and "email"

YP_Details.txt URL https://www.yellowpages.com/bakersfield-ca/mip/robson-eilers-jewelers-6717482

here is the code

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
with open('YP_Details.txt', 'r') as f:
    for url in f:
        print(url)        
        uClient = urlopen(url)
        page_html = uClient.read()        
        uClient.close()
        page_soup = soup(page_html, "html.parser")

        out_filename = "YP_Details.csv"

        containers = page_soup.findAll("header", {"id":"main-header"})
        headers = "Business_Name,Address,Phone,Website,Email \n"
        with open(out_filename, "w") as fout:
          fout.write(headers)
          for container in containers:
                      Business_Name = container.h1.text
                      Address = container.h2.text
                      Phone = container.p.text

                      #want to get the "href" value as output
                      Website_container = container.findAll("a", {"class": "website-link"})
                      Website = Website_container[0].text

                      #want to get the "href" value as output
                      Email_container = container.findAll("a", {"class": "email-business"})
                      Email = Email_container[0].text

          print("Business_Name:" + Business_Name + "Address:" + Address + "Phone:" + Phone + "Website:" + Website + "Email:" + Email + "\n" )
          fout.write(Business_Name + "," + Address.replace(",", "|") + ", " + Phone + ", " + Website + ", " + Email + "\n")

Upvotes: 1

Views: 46

Answers (1)

Sri
Sri

Reputation: 2328

You have to reference the href attribute of the element. The href is not the text. .text refers to the content within the elements opening and closing tag.

          #want to get the "href" value as output
          Website_container = container.findAll("a", {"class": "website-link"})
          Website = Website_container[0]['href']
          print(Website)

          #want to get the "href" value as output
          Email_container = container.findAll("a", {"class": "email-business"})
          Email = Email_container[0]['href']
          print(Email)

Upvotes: 1

Related Questions