Reputation: 11

request.get returns 400 response when looping -- even though the URL is still the same

I tried to loop over a list of URL to get the image URL of all the pages. However, when using loop, the request returns 400. When I tested individual URL, it works(200). Fail since the first call.

Tried adding time delay but still doesn't work.

f = open(url_file)

lineList = f.readlines()
print(lineList[0]) # Test
i = 1
for url in lineList:
    print(url) # Test -- the url is the same as lineList[0] above
    res = requests.get(url) # works when copied the printed url in but not as a variable

Expected 200 -- error gave 400

Upvotes: 1

Answers (3)

Ivan Vinogradov

Reputation: 4483

Explanation

If your url_file has newlines (\n character) as line separators, it may result in erratic response from the server. This is because \n is not automatically stripped from the end of each line by f.readlines(). Some servers will ignore this character in the URL and return 200 OK, some will not.

For example:

f = open(r"C:\data\1.txt")  # text file with newline as line separator
list_of_urls = f.readlines()
print(list_of_urls)

Outputs

['https://habr.com/en/users/\n', 'https://stackoverflow.com/users\n']

If you run requests.get() on these exact URLs above, you will receive 404 and 400 HTTP status codes respectively. Without \n at the end they are valid existent web pages - you can check it yourself.

You haven't noticed these extra \n in your code because you used print() on each item which does not show this symbol "explicitly" as \n.

How to fix

Use splitlines() instead of readlines() to get rid of \n at the end:

import requests

with open(url_file) as f:
    list_of_urls = f.read().splitlines()  # read file without line delimiters

for url in list_of_urls:
    res = requests.get(url)
    print(res.status_code)

Upvotes: 2

Coconutcake

Reputation: 196

Use urllib2 and change adres of txtfile where webpages are stored:

example source of urls: http://mign.pl/ver.txt

import requests
import urllib.request as urllib2

response = urllib2.urlopen('http://mign.pl/ver.txt')
x=response.read().decode("utf-8")
d=x.split("\n")
print(d)

for u in d:
    res = requests.get(u)
    print(res.status_code)

output:

200
200

Upvotes: 0