Python looping for web scraping

Question

I am trying to extract data from a website.

The code i have written is

import csv

import requests 

from bs4 import BeautifulSoup

page = requests.get("http://www.realcommercial.com.au/sold/property-offices-
  retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find('p', attrs ={'class' :'details-panel__address'})

Address = Address.text.strip()

The result i am getting is

 'GF 255 Adelaide TerracePerth, WA 6000'

which is just one line of address of one listing.

When I am using soup.find_all, I am getting the result like:

 p class="details-panel__address" data-reactid="90">GF 255 Adelaide 
 TerracePerth, WA 6000
369-371 Oxford 
 StreetMount Hawthorn, WA 6016,

  p class="details-panel__address" data-reactid="148">2 Lloyd Street
 Midland, WA 6056,

  p class="details-panel__address" data-reactid="172">Bluenote Building, 16/162 
 Colin StreetWest Perth, WA 6005,

  p class="details-panel__address" data-reactid="196">Bluenote Building, 10/162 
 Colin StreetWest Perth, WA 6005

Please suggest what I should do to extract the information regarding address, property type, Sold Date, Sales Value, Area, Agency Name, Agent's Name & Phone Number of all the listings on this page. Also, I do not know how to use loop to open each listing on a particular page and get the information out of it.

Himanshu dua · Accepted Answer

soup.find_all return list of elements.To get the text you have to iterate over the list of elements to extract the text with text attribute.

import requests 

from bs4 import BeautifulSoup

page = requests.get("""http://www.realcommercial.com.au/sold/property-offices-
  retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true""")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find_all('p', attrs ={'class' :'details-panel__address'})
address_list = [ address.text.strip() for address in Address_1]
print(address_list)
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]
print(hrefs)
# Now iterate through the list of urls and extract the required data

Python looping for web scraping

Answers (1)

Related Questions