loop for multiple URLs

Question

I am trying to scrape a web page with the following code:-

import requests 
from bs4 import BeautifulSoup

page = requests.get("http://www.realcommercial.com.au/sold/property-offices-retail-showrooms+bulky+goods-land+development-hotel+leisure-medical+consulting-other-in-vic/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")

soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]

for urls in hrefs:
    pages = requests.get(urls)
    soup_2 =BeautifulSoup(pages.content, 'html.parser')

    Date = soup_2.find_all('li', attrs ={'class' :'sold-date'})
    Sold_Date = [Sold_Date.text.strip() for Sold_Date in Date]
    Address_1 = soup_2.find_all('p', attrs={'class' :'full-address'})
    Address = [Address.text.strip() for Address in Address_1]

the above code is returning only the details from the first URL in the hrefs.

['Mon 05-Jun-17'] ['261 Keilor Road, Essendon, Vic 3040']

I need the loop to run through each URL in hrefs & return similar details from each URL in hrefs. Please suggest what should i add/edit in the above code. Any help would be highly appreciated.

Thanks

Anubhav Singh · Accepted Answer

It is behaving in correct manner. You need to store the information in an external list and then return it.

import requests 
from bs4 import BeautifulSoup

page = requests.get("http://www.realcommercial.com.au/sold/property-offices-retail-showrooms+bulky+goods-land+development-hotel+leisure-medical+consulting-other-in-vic/list-1?includePropertiesWithin=includesurrounding&activeSort=list-date&autoSuggest=true")

soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('a', attrs ={'class' :'details-panel'})
hrefs = [link['href'] for link in links]
Data = []
for urls in hrefs:
    pages = requests.get(urls)
    soup_2 =BeautifulSoup(pages.content, 'html.parser')

    Date = soup_2.find_all('li', attrs ={'class' :'sold-date'})
    Sold_Date = [Sold_Date.text.strip() for Sold_Date in Date]
    Address_1 = soup_2.find_all('p', attrs={'class' :'full-address'})
    Address = [Address.text.strip() for Address in Address_1]
    Data.append(Sold_Date + Address)
return Data

loop for multiple URLs

Answers (2)

Related Questions