Reputation: 189
I'm trying to scrape many pages in Python using BeautifulSoup but with no positive results.
I tried using request.get()
and session.get()
. The number of pages I should scrape is 92.
import requests
from bs4 import BeautifulSoup
import urllib.request
with requests.Session as session:
count = 0
for i in range(92):
count+=1
page = "https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(count)+".html"
r = session.get(page)
soup = BeautifulSoup(r.content)
Using print(page)
the page are formatted corectly. But executing soup
to print all the values stored in the variable, only the values of the first page are printed.
I'm using a jupyter notebook
Upvotes: 1
Views: 175
Reputation: 2469
Another solution.
from simplified_scrapy.request import req
from simplified_scrapy.simplified_doc import SimplifiedDoc
count = 0
for i in range(92):
count+=1
html = req.get('https://www.paginegialle.it/lazio/roma/dentisti/p-'+str(i)+'.html')
doc = SimplifiedDoc(html)
print(doc.select('title>text()'))
print (count)
Upvotes: 0
Reputation: 695
This will work.
from bs4 import BeautifulSoup
import requests
count = 0
for i in range(92):
count +=1
source1 = requests.get("https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(count)+".html").text
soup1 = BeautifulSoup(source1, 'lxml')
print(soup1.body)
print()
print("done")
Upvotes: 0
Reputation: 10304
you can do as below:
import requests
from bs4 import BeautifulSoup
import urllib.request
for i in range(92):
url = "https://www.paginegialle.it/lazio/roma/dentisti/p-"+str(i)+".html"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
p = soup.select('p')
print(len(p))
Upvotes: 1