Okenite
Okenite

Reputation: 189

Scraping multiple pages with Python and BeautifulSoup

I'm trying to scrape many pages in Python using BeautifulSoup but with no positive results.

I tried using request.get() and session.get(). The number of pages I should scrape is 92.

import requests
from bs4 import BeautifulSoup
import urllib.request
with requests.Session as session:
    count = 0
    for i in range(92):
        count+=1
        page = "https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(count)+".html"
        r = session.get(page)
        soup = BeautifulSoup(r.content)

Using print(page) the page are formatted corectly. But executing soup to print all the values stored in the variable, only the values of the first page are printed. I'm using a jupyter notebook

Upvotes: 1

Views: 175

Answers (3)

dabingsou
dabingsou

Reputation: 2469

Another solution.

from simplified_scrapy.request import req
from simplified_scrapy.simplified_doc import SimplifiedDoc
count = 0
for i in range(92):
    count+=1
    html = req.get('https://www.paginegialle.it/lazio/roma/dentisti/p-'+str(i)+'.html') 
    doc = SimplifiedDoc(html)
    print(doc.select('title>text()'))
print (count)

Upvotes: 0

Bruno
Bruno

Reputation: 695

This will work.

from bs4 import BeautifulSoup
import requests

count = 0
for i in range(92):
   count +=1
   source1 = requests.get("https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(count)+".html").text 

   soup1 = BeautifulSoup(source1, 'lxml')

   print(soup1.body)
   print()
print("done")

Upvotes: 0

Tal Avissar
Tal Avissar

Reputation: 10304

you can do as below:

import requests
from bs4 import BeautifulSoup
import urllib.request

for i in range(92):
    url = "https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(i)+".html"
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    p = soup.select('p')
    print(len(p))

enter image description here

Upvotes: 1

Related Questions