ViktorovicsL
ViktorovicsL

Reputation: 75

Python - Iterate through pages with BeautifulSoup

I use BeautifulSoup4 to scrape data from a few webpages. For example in below case, the url is https://wadsfred.aliexpress.com/store/425826/search/1.html, and there are 96 pages. My problem is that the script throws me an error after several pages. Usually, when the code reaches page 15-20. The error message:

Traceback (most recent call last): File "main.py", line 34, in if next_page.text != 'Next': AttributeError: 'NoneType' object has no attribute 'text'

Thanks for the help in advance!

import requests
import os
import csv
from itertools import count
from bs4 import BeautifulSoup

os.chdir('C:\MyFolder')
page_nr = 1
price = []
min_order = []
prod_name = []

for page_number in count(start = 1):
    url = 
'https://wadsfred.aliexpress.com/store/425826/search/{}'.format(page_nr) + 
'.html'
    print(url)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    for div_b in soup.find_all('div', {'class':'cost'}):
        price.append(div_b.text)

    for min_or in soup.find_all('span', {'class':'min-order'}):
        min_order.append(min_or.text)

    for pr_name in soup.find_all('div', {'class':'detail'}):
        for pr_h in pr_name.find_all('h3'):
            for pr_title in pr_h.find_all('a'):
                prod_name_s = (pr_title.get('title').strip())
                prod_name.append(prod_name_s[:120])

    print(len(prod_name))
    page_nr = page_nr + 1
    next_page = soup.find('a', {'class':'ui-pagination-next'})
    if next_page.text != 'Next':
      break

Upvotes: 1

Views: 1102

Answers (2)

ewwink
ewwink

Reputation: 19154

It redirected to login page, Add user-agent to your request

heads = {"User-Agent" : 'Mozilla/5.0......'}
for page_number in count(start = 1):
    .....
    response = requests.get(url, headers=heads)

even better use requests.session() to create persistent session (cookies)

Upvotes: 1

Pradeep Pathak
Pradeep Pathak

Reputation: 454

Probably the 'a' tag with class 'ui-pagination-next' is not present in some pages. You can skip that check when you already know there are 96 pages in all. And put the scraping block in try catch to skip errors in some pages.

Upvotes: 0

Related Questions