user10878963
user10878963

Reputation: 49

Problem exporting Web Url results into CSV using beautifulsoup3

Problem: I tried to export results (Name, Address, Phone) into CSV but the CSV code not returning expected results.

#Import the installed modules
import requests
from bs4 import BeautifulSoup
import json
import re
import csv

#To get the data from the web page we will use requests get() method
url = "https://www.lookup.pk/dynamic/search.aspx?searchtype=kl&k=gym&l=lahore"
page = requests.get(url)

# To check the http response status code
print(page.status_code)

#Now I have collected the data from the web page, let's see what we got
print(page.text)

#The above data can be view in a pretty format by using beautifulsoup's prettify() method. For this we will create a bs4 object and use the prettify method
soup = BeautifulSoup(page.text, 'lxml')
print(soup.prettify())

#Find all DIVs that contain Companies information
product_name_list = soup.findAll("div",{"class":"CompanyInfo"})

#Find all Companies Name under h2tag
company_name_list_heading = soup.findAll("h2")

#Find all Address on page Name under a tag
company_name_list_items = soup.findAll("a",{"class":"address"})

#Find all Phone numbers on page Name under ul
company_name_list_numbers = soup.findAll("ul",{"class":"submenu"})

Created for loop to print out all company Data

for company_address in company_name_list_items:
    print(company_address.prettify())

# Create for loop to print out all company Names
for company_name in company_name_list_heading:
    print(company_name.prettify())

# Create for loop to print out all company Numbers
for company_numbers in company_name_list_numbers:
    print(company_numbers.prettify())

Below is the code to export the results (name, address & phonenumber) into CSV

    outfile = open('gymlookup.csv','w', newline='')

writer = csv.writer(outfile)

writer.writerow(["name", "Address", "Phone"])

product_name_list = soup.findAll("div",{"class":"CompanyInfo"})
company_name_list_heading = soup.findAll("h2")
company_name_list_items = soup.findAll("a",{"class":"address"})
company_name_list_numbers = soup.findAll("ul",{"class":"submenu"})

Here is the for loop to loop over data.

for company_name in company_name_list_heading:
    names = company_name.contents[0]

for company_numbers in company_name_list_numbers:
    names = company_numbers.contents[1]

for company_address in company_name_list_items:
    address = company_address.contents[1]

    writer.writerow([name, Address, Phone])

outfile.close()

Upvotes: 0

Views: 26

Answers (1)

chitown88
chitown88

Reputation: 28565

You need to work on understanding how for loops work, and also the difference between strings, and variables and other datatypes. You also need to work on using what you have seen from other stackoverflow questions and learn to apply that. This is essentially the same as youre other 2 questions you already posted, but just a different site you're scraping from (but I didn't flag it as a duplicate, as you're new to stackoverflow and web scrpaing and I remember what it was like to try to learn). I'll still answer your questions, but eventually you need to be able to find the answers on your own and learn how to adapt it and apply (coding isn't a paint by colors. Which I do see you are adapting some of it. Good job in finding the "div",{"class":"CompanyInfo"} tag to get the company info)

That data you are pulling (name, address, phone) needs to be within a nested loop of the div class=CompanyInfo element/tag. You could theoretically have it the way you have it now, by putting those into a list, and then writing to the csv file from your lists, but theres a risk of data missing and then your data/info could be off or not with the correct corresponding company.

Here's what the full code looks like. notice that the variables are stored with in the loop, and then written. It then goes to the next block of CompanyInfo and continues.

#Import the installed modules
import requests
from bs4 import BeautifulSoup
import csv

#To get the data from the web page we will use requests get() method
url = "https://www.lookup.pk/dynamic/search.aspx?searchtype=kl&k=gym&l=lahore"
page = requests.get(url)

# To check the http response status code
print(page.status_code)

#Now I have collected the data from the web page, let's see what we got
print(page.text)

#The above data can be view in a pretty format by using beautifulsoup's prettify() method. For this we will create a bs4 object and use the prettify method
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())

outfile = open('gymlookup.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Name", "Address", "Phone"])


#Find all DIVs that contain Companies information
product_name_list = soup.findAll("div",{"class":"CompanyInfo"})

# Now loop through those elements
for element in product_name_list:

    # Takes 1 block of the "div",{"class":"CompanyInfo"} tag and finds/stores name, address, phone
    name = element.find('h2').text
    address = element.find('address').text.strip()
    phone = element.find("ul",{"class":"submenu"}).text.strip()

    # writes the name, address, phone to csv
    writer.writerow([name, address, phone])

    # now will go to the next "div",{"class":"CompanyInfo"} tag and repeats     

outfile.close()

Upvotes: 1

Related Questions