Wiidi Chandra
Wiidi Chandra

Reputation: 1

scraping pagination web with beautifulsoup python

just started learning python. I'm trying to scrape all phone number from a paginated web site. but my code not go to paginate link and only looping on a same page. need advice here.

from bs4 import BeautifulSoup
import requests

for i in range(5000):
    url = "http://www.mobil123.com/mobil?type=used&page_number=1".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content)

    for record in soup.findAll('div', {"class": "card-contact-wrap"}):
        for data in soup.findAll('div', {"data-get-content": "#whatsapp"}):
            print(record.find('li').text)
            print(data.text)

Upvotes: 0

Views: 2579

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

As already pointed out you are missing the actual format placeholder, if you want all the pages you can scrape the number of pages from the initial page and loop in that range instead of trying to hard code the number of pages, it is on the second last li:

import requests

def get_pages(url):
    soup = BeautifulSoup(requests.get(url).content,"lxml")
    yield soup
    url += "{}"
    for n in range(2, int(soup.select("#js-listings-pagination li")[-2].text) + 1):
        yield BeautifulSoup(requests.get(url.format(n)).content)




start = "http://www.mobil123.com/mobil?type=used"

for soup in get_pages(start):
    print(soup)

Upvotes: 1

Tanu
Tanu

Reputation: 1563

You missed placing string formatter. Change url = "...." to

  url = "http://www.mobil123.com/mobil?type=used&page_number={0}".format(i)

Upvotes: 1

Related Questions