dabeksci
dabeksci

Reputation: 41

Python BeautifulSoup not scraping multiple pages

I'm trying to scrape from a webpage that has the format of 15 ads per page and then you click on the next page and get the next 15 ads.

For some reason, the script only scrapes from one page and never goes to another.

Here's my code of the script:

page_num = 10
curr_page = 1
i = 1
car_title, price_hrk, year_made, km_made, date_pub, temp = [], [], [], [], [], []
title = soup.find_all(class_="classified-title")
price_kn = soup.find_all(class_="price-kn")
info = soup.find_all(class_="info-wrapper")
date = soup.find_all("span", class_="date")


# while the current page is less then or equal to the page_num variable
while curr_page <= page_num:
    # make a request with a current page
    page = requests.get("https://www.oglasnik.hr/prodaja-automobila?page={}".format(curr_page))
    # pass it to beautiful soup
    soup = BeautifulSoup(page.content, "html.parser")

    # while i is less then 15 elements on the single site
    while i <= 15:
        # check for existance
        if title[i]:
            # append the value
            car_title.append(title[i].get_text())
        else:
            # append NaN
            car_title.append(np.nan)

        if price_kn[i]:
            price_hrk.append(price_kn[i].get_text())
        else:
            price_hrk.append(np.nan)

        if date[i]:
            date_pub.append(date[i].get_text())
        else:
            date_pub.append(np.nan)

        # dual values, so append both to a temporary list
        for tag in info[i].find_all("span", class_="classified-param-value"):
            for val in tag:
                temp.append(val)

        try:
            # if length of element is less then 5
            if len(temp[0]) < 5:
                # it's a year, append to year_made list
                year_made.append(temp[0])
            km_made.append(temp[2])
        except IndexError:
            # if index out of bound append NaN
            year_made.append(np.nan)
            km_made.append(np.nan)

        # reset temp
        temp = []
        # add 1 to i element
        i += 1

    # add 1 to current page
    curr_page += 1

And now if I print out the length of one of the lists I get 15.

Can someone tell me what am I doing wrong or point me in the right direction?

Thanks.

Upvotes: 4

Views: 128

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24930

You also need to reset your i. Right before (or after)

   curr_page += 1

Add:

   i = 1

And it should work.

Upvotes: 2

Related Questions