Reputation: 37
I have this list, what is the best to go to extract a piece of info from each one and store this info into another list consider the wanted info <a>hello world</a>
def pagination():
pagination = range(1, 100)
for p in pagination:
page = f"https://www.xx.xx{p}"
Upvotes: 0
Views: 111
Reputation: 11515
Since you are dealing with a single host. so you have to maintain the session
object firstly so you will avoid to be blocked or flagged by almost of sites firewalls as DDOS-Attack
, where you actually maintain the same TCP
socket stream without keep open/close/reopen the socket.
After that you can loop over your desired pagination parameter and extract the title.
Below is an example for that.
import requests
from bs4 import BeautifulSoup
def main(url):
with requests.Session() as req:
for page in range(1, 11):
r = req.get(url.format(page))
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.title.text)
main("https://www.example.com/page={}")
Upvotes: 1