Reputation: 187
I am new to this and my code runs successfully but only with one URL in the .txt file, if I add more it throws an error. I have tried multiple methods I found on this site but can't seem to find one that works. If anyone can assist me that would be great.
My main objective is for it to look at the first URL, after it has completed, then start the 2nd URL and loop through them.
Here is what I have right now...
import requests
import lxml.html
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from dhooks import Webhook, Embed
ua = UserAgent()
header = {'user-agent':ua.random}
with open('urls.txt','r') as file:
for url in file.readlines():
result = requests.get(url,headers=header,timeout=3)
src = result.content
soup = BeautifulSoup(src, 'lxml')
Upvotes: 1
Views: 123
Reputation: 4518
There is far too much going on in the code. I'm not sure what the actual issue is? Can you fetch url.txt? If so what does this contain?
As a starting point try separate your code into methods.
For example:
import requests
import lxml.html
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from dhooks import Webhook, Embed
def getReadMe():
with open('urls.txt','r') as file:
return file.read()
def getHtml(readMe):
ua = UserAgent()
header = {'user-agent':ua.random}
response = requests.get(readMe,headers=header,timeout=3)
response.raise_for_status() # throw error for 4xx & 5xx
return response.content
readMe = getReadMe()
print(readMe) #TODO: does this output text? If so what is it?
html = getHtml(readMe)
soup = BeautifulSoup(src, 'lxml')
# TODO: what is in the response html?
Upvotes: 1
Reputation: 336
You need to loop over them. This code assumes there is one url per line in your file:
import requests
import lxml.html
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from dhooks import Webhook, Embed
ua = UserAgent()
header = {'user-agent':ua.random}
with open('urls.txt','r') as file:
for url in file.readlines():
result = requests.get(url,headers=header,timeout=3)
src = result.content
soup = BeautifulSoup(src, 'lxml')
Upvotes: 1