Renu sharma
Renu sharma

Reputation: 87

Python looping for web scraping

I am trying to extract data from a website.

The code i have written is

import csv

import requests 

from bs4 import BeautifulSoup

page = requests.get("http://www.realcommercial.com.au/sold/property-offices-
  retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find('p', attrs ={'class' :'details-panel__address'})

Address = Address.text.strip()

The result i am getting is

 'GF 255 Adelaide TerracePerth, WA 6000'

which is just one line of address of one listing.

When I am using soup.find_all, I am getting the result like:

 p class="details-panel__address" data-reactid="90"><span class="details-
 panel__address-text text-truncate" data-reactid="91">GF 255 Adelaide 
 Terrace</span><span class="details-panel__address-text text-truncate" data-
 reactid="92">Perth, WA 6000</span></p

  p class="details-panel__address" data-reactid="122"><span class="details-
 panel__address-text text-truncate" data-reactid="123">369-371 Oxford 
 Street</span><span class="details-panel__address-text text-truncate" data-
 reactid="124">Mount Hawthorn, WA 6016</span></p>,

  p class="details-panel__address" data-reactid="148"><span class="details-
 panel__address-text text-truncate" data-reactid="149">2 Lloyd Street</span>
 <span class="details-panel__address-text text-truncate" data-
 reactid="150">Midland, WA 6056</span></p>,

  p class="details-panel__address" data-reactid="172"><span class="details-
 panel__address-text text-truncate" data-reactid="173">Bluenote Building, 16/162 
 Colin Street</span><span class="details-panel__address-text text-truncate" 
 data-reactid="174">West Perth, WA 6005</span></p>,

  p class="details-panel__address" data-reactid="196"><span class="details-
 panel__address-text text-truncate" data-reactid="197">Bluenote Building, 10/162 
 Colin Street</span><span class="details-panel__address-text text-truncate" 
 data-reactid="198">West Perth, WA 6005</span></p>

Please suggest what I should do to extract the information regarding address, property type, Sold Date, Sales Value, Area, Agency Name, Agent's Name & Phone Number of all the listings on this page. Also, I do not know how to use loop to open each listing on a particular page and get the information out of it.

Upvotes: 2

Views: 100

Answers (1)

Himanshu dua
Himanshu dua

Reputation: 2523

soup.find_all return list of elements.To get the text you have to iterate over the list of elements to extract the text with text attribute.

import requests 

from bs4 import BeautifulSoup

page = requests.get("""http://www.realcommercial.com.au/sold/property-offices-
  retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true""")

soup = BeautifulSoup(page.content, 'html.parser')

Address_1 = soup.find_all('p', attrs ={'class' :'details-panel__address'})
address_list = [ address.text.strip() for address in Address_1]
print(address_list)
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]
print(hrefs)
# Now iterate through the list of urls and extract the required data

Upvotes: 3

Related Questions