Reputation: 11
I tried to loop over a list of URL to get the image URL of all the pages. However, when using loop, the request returns 400. When I tested individual URL, it works(200). Fail since the first call.
Tried adding time delay but still doesn't work.
f = open(url_file)
lineList = f.readlines()
print(lineList[0]) # Test
i = 1
for url in lineList:
print(url) # Test -- the url is the same as lineList[0] above
res = requests.get(url) # works when copied the printed url in but not as a variable
Expected 200 -- error gave 400
Upvotes: 1
Views: 525
Reputation: 4483
If your url_file
has newlines (\n
character) as line separators, it may result in erratic response from the server. This is because \n
is not automatically stripped from the end of each line by f.readlines()
. Some servers will ignore this character in the URL and return 200 OK
, some will not.
For example:
f = open(r"C:\data\1.txt") # text file with newline as line separator
list_of_urls = f.readlines()
print(list_of_urls)
Outputs
['https://habr.com/en/users/\n', 'https://stackoverflow.com/users\n']
If you run requests.get()
on these exact URLs above, you will receive 404
and 400
HTTP status codes respectively. Without \n
at the end they are valid existent web pages - you can check it yourself.
You haven't noticed these extra \n
in your code because you used print()
on each item which does not show this symbol "explicitly" as \n
.
Use splitlines()
instead of readlines()
to get rid of \n
at the end:
import requests
with open(url_file) as f:
list_of_urls = f.read().splitlines() # read file without line delimiters
for url in list_of_urls:
res = requests.get(url)
print(res.status_code)
Upvotes: 2
Reputation: 196
Use urllib2 and change adres of txtfile where webpages are stored:
example source of urls: http://mign.pl/ver.txt
import requests
import urllib.request as urllib2
response = urllib2.urlopen('http://mign.pl/ver.txt')
x=response.read().decode("utf-8")
d=x.split("\n")
print(d)
for u in d:
res = requests.get(u)
print(res.status_code)
output:
200
200
Upvotes: 0
Reputation: 196
another option using generator: example source of urls: http://mign.pl/ver.txt
import requests
import urllib.request as urllib2
print(*(requests.get(u).status_code for u in urllib2.urlopen('http://mign.pl/ver.txt').read().decode("utf-8").split("\n")))
output:
200 200
Upvotes: 0