Reputation: 187
My script currently looks at a list of 5 URLs, once it reaches the end of the list it stops scraping. I want it to loop back to the first URL after it completes the last URL. How would I achieve that?
The reason I want it to loop is to monitor for any changes in the product such as the price etc.
I tried looking at a few method I found online but couldn't figure it out as I am new to this. Hope you can help!
import requests
import lxml.html
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from dhooks import Webhook, Embed
import random
ua = UserAgent()
header = {'User-Agent':ua.chrome}
# Proxies
proxy_list = []
for line in open('proxies.txt', 'r'):
line = line.replace('\n', '')
proxy_list.append(line)
def get_proxy():
proxy = random.choice(proxy_list)
proxies = {
"http": f'{str(proxy)}',
"https": f'{str(proxy)}'
}
return proxies
# Opening URL file
with open('urls.txt','r') as file:
for url in file.readlines():
proxies = get_proxy()
result = requests.get(url.strip() ,headers=header,timeout=4,proxies=proxies)
#src = result.content
soup = BeautifulSoup(result.content, 'lxml')
Upvotes: 0
Views: 79
Reputation: 566
You can store the urls in a list and do a while loop over it, the basic logic will be
with open('urls.txt','r') as file:
url_list = file.readlines()
pos = 0
while True:
if pos >= len(url_list):
pos = 0
url = url_list[pos]
pos += 1
*** rest of your logic ***
Upvotes: 1
Reputation: 357
You can add a while True:
loop outside and above your main with statement & for loop (and add one level of indent to every line inside). This way the program will keep running until terminated by user.
Upvotes: 0