Reputation: 87
I am trying to extract data from a website.
The code i have written is
import csv
import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.realcommercial.com.au/sold/property-offices-
retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")
soup = BeautifulSoup(page.content, 'html.parser')
Address_1 = soup.find('p', attrs ={'class' :'details-panel__address'})
Address = Address.text.strip()
The result i am getting is
'GF 255 Adelaide TerracePerth, WA 6000'
which is just one line of address of one listing.
When I am using soup.find_all
, I am getting the result like:
p class="details-panel__address" data-reactid="90"><span class="details-
panel__address-text text-truncate" data-reactid="91">GF 255 Adelaide
Terrace</span><span class="details-panel__address-text text-truncate" data-
reactid="92">Perth, WA 6000</span></p
p class="details-panel__address" data-reactid="122"><span class="details-
panel__address-text text-truncate" data-reactid="123">369-371 Oxford
Street</span><span class="details-panel__address-text text-truncate" data-
reactid="124">Mount Hawthorn, WA 6016</span></p>,
p class="details-panel__address" data-reactid="148"><span class="details-
panel__address-text text-truncate" data-reactid="149">2 Lloyd Street</span>
<span class="details-panel__address-text text-truncate" data-
reactid="150">Midland, WA 6056</span></p>,
p class="details-panel__address" data-reactid="172"><span class="details-
panel__address-text text-truncate" data-reactid="173">Bluenote Building, 16/162
Colin Street</span><span class="details-panel__address-text text-truncate"
data-reactid="174">West Perth, WA 6005</span></p>,
p class="details-panel__address" data-reactid="196"><span class="details-
panel__address-text text-truncate" data-reactid="197">Bluenote Building, 10/162
Colin Street</span><span class="details-panel__address-text text-truncate"
data-reactid="198">West Perth, WA 6005</span></p>
Please suggest what I should do to extract the information regarding address, property type, Sold Date, Sales Value, Area, Agency Name, Agent's Name & Phone Number of all the listings on this page. Also, I do not know how to use loop to open each listing on a particular page and get the information out of it.
Upvotes: 2
Views: 100
Reputation: 2523
soup.find_all
return list of elements.To get the text you have to iterate over the list of elements to extract the text with text
attribute.
import requests
from bs4 import BeautifulSoup
page = requests.get("""http://www.realcommercial.com.au/sold/property-offices-
retail-showrooms+bulky+goods-land+development-hotel+leisure+medical+consulting-other-in-wa/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true""")
soup = BeautifulSoup(page.content, 'html.parser')
Address_1 = soup.find_all('p', attrs ={'class' :'details-panel__address'})
address_list = [ address.text.strip() for address in Address_1]
print(address_list)
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]
print(hrefs)
# Now iterate through the list of urls and extract the required data
Upvotes: 3