Reputation: 61
I'm trying to scrape a page using a list of proxies. This small problem is litrally driving me nuts. It works when i input the proxy directly like this:
proxies = {
'http': 'http://10.0.1.1:8080',
'https': 'http://10.0.1.1:8080'
}
But when i use something like
http_proxy = 'http://'+proxy
https_proxy = 'https://'+proxy
proxies = {
'http': http_proxy,
'https': https_proxy,
}
requests.packages.urllib3.exceptions.LocationParseError: Failed to parse: 10.0.1.1:8080
I get this error. This makes absolutely no sense.
Edit: i just realized its probably because of the newline after each proxy i have the proxylist.txt hosted on a server so now i need to find out how to get rid of the newline after each proxy i tried stuff like proxy.strip('\n') but that didn't work either
Upvotes: 3
Views: 21090
Reputation: 33
I was going crazy because of that problem.
Try to do this:
def chomp(x):
if x.endswith("\r\n"):
return x[:-2]
if x.endswith("\n") or x.endswith("\r"):
return x[:-1]
return x
http_proxy = 'http://' + chomp(proxy)
https_proxy = 'https://' + chomp(proxy)
proxies = {
'http': http_proxy,
'https': https_proxy,
}
It helped solve my problem.
Upvotes: 3
Reputation: 597
Another dump option is that the proxy itself is not good anymore. I tried running the same code with one proxy and received this error. None of the solutions above helped me (and actually I believe they fixed this problem in newer versions: https://github.com/kennethreitz/requests/issues/4613). However, when I tried using good proxy I didn't encounter this type of problem
Upvotes: 0
Reputation: 231
I tried proxylist.txt with 2 lines
10.0.1.1:8080
10.0.1.1:8181
and executed below code,
with open('proxylist.txt','r') as reader :
for line in reader :
proxy = line.split('\n', 1)[0]
http_proxy = 'http://'+proxy
https_proxy = 'https://'+proxy
proxies = {
'http': http_proxy,
'https': https_proxy,
}
print proxies
Got output as expected,
{'http': 'http://10.0.1.1:8080', 'https': 'https://10.0.1.1:8080'}
{'http': 'http://10.0.1.1:8181', 'https': 'https://10.0.1.1:8181'}
Upvotes: 1
Reputation: 61
Always check after using .split could have extra characters i fixed my project using
splitlines()
Upvotes: 1