ashkrelja
ashkrelja

Reputation: 80

Looping through web pages to webscrape data

I'm trying to loop through Zillow pages and extract data. I know that the URL is being updated with a new page number after each iteration but the data extracted is as if the URL is still on page 1.

import selenium
from selenium import webdriver
import requests
from bs4 import BeautifulSoup
import pandas as pd

next_page='https://www.zillow.com/romeo-mi-48065/real-estate-agent-reviews/'

num_data1=pd.DataFrame(columns=['name','number'])

browser=webdriver.Chrome()
browser.get('https://www.zillow.com/romeo-mi-48065/real-estate-agent-reviews/')

while True:

    page=requests.get(next_page)

    contents=page.content

    soup = BeautifulSoup(contents, 'html.parser')

    number_p=soup.find_all('p', attrs={'class':'ldb-phone-number'},text=True)
    name_p=soup.find_all('p', attrs={'class':'ldb-contact-name'},text=True)

    number_p=pd.DataFrame(number_p,columns=['number'])
    name_p=pd.DataFrame(name_p,columns=['name'])

    num_data=number_p['number'].apply(lambda x: x.text.strip())
    nam_data=name_p['name'].apply(lambda x: x.text.strip())

    number_df=pd.DataFrame(num_data,columns=['number'])
    name_df=pd.DataFrame(nam_data,columns=['name'])

    num_data0=pd.concat([number_df,name_df],axis=1)

    num_data1=num_data1.append(num_data0)

        try:

            button=browser.find_element_by_css_selector('.zsg-pagination>li.zsg-pagination-next>a').click()
            next_page=str(browser.current_url)

        except IndexError:

            break

Upvotes: 0

Views: 380

Answers (2)

Dean W.
Dean W.

Reputation: 642

Replace page=requests.get(next_page) with page = browser.page_source

Basically what's happening is that you're going to the next page in Chrome, but then trying to load that page's url with requests which is getting redirected back to page one by Zillow (probably because it doesn't have the cookies or appropriate request headers).

Upvotes: 0

Breaks Software
Breaks Software

Reputation: 1761

why not make your life easier and use the Zillow API instead of scraping? (do you even have permission to scrape their site?)

Upvotes: 0

Related Questions