taijamen
taijamen

Reputation: 61

Python Requests Proxy error 'Failed to parse'

I'm trying to scrape a page using a list of proxies. This small problem is litrally driving me nuts. It works when i input the proxy directly like this:

proxies = {
            'http': 'http://10.0.1.1:8080',
            'https': 'http://10.0.1.1:8080'
        }

But when i use something like

http_proxy =  'http://'+proxy
https_proxy = 'https://'+proxy



    proxies = {
            'http': http_proxy,
            'https': https_proxy,
        }

requests.packages.urllib3.exceptions.LocationParseError: Failed to parse: 10.0.1.1:8080

I get this error. This makes absolutely no sense.

Edit: i just realized its probably because of the newline after each proxy i have the proxylist.txt hosted on a server so now i need to find out how to get rid of the newline after each proxy i tried stuff like proxy.strip('\n') but that didn't work either

Upvotes: 3

Views: 21090

Answers (4)

crx
crx

Reputation: 33

I was going crazy because of that problem.

Try to do this:

def chomp(x):
    if x.endswith("\r\n"):
        return x[:-2]
    if x.endswith("\n") or x.endswith("\r"):
        return x[:-1]
    return x

http_proxy =  'http://' + chomp(proxy)
https_proxy = 'https://' + chomp(proxy)

proxies = {
       'http': http_proxy,
       'https': https_proxy,
    }

It helped solve my problem.

Upvotes: 3

Eli Borodach
Eli Borodach

Reputation: 597

Another dump option is that the proxy itself is not good anymore. I tried running the same code with one proxy and received this error. None of the solutions above helped me (and actually I believe they fixed this problem in newer versions: https://github.com/kennethreitz/requests/issues/4613). However, when I tried using good proxy I didn't encounter this type of problem

Upvotes: 0

StackTrace
StackTrace

Reputation: 231

I tried proxylist.txt with 2 lines
10.0.1.1:8080
10.0.1.1:8181

and executed below code,

with open('proxylist.txt','r') as reader :
    for line in reader :
        proxy = line.split('\n', 1)[0]
        http_proxy =  'http://'+proxy
        https_proxy = 'https://'+proxy

        proxies = {
            'http': http_proxy,
            'https': https_proxy,
        }

        print proxies

Got output as expected,
{'http': 'http://10.0.1.1:8080', 'https': 'https://10.0.1.1:8080'}
{'http': 'http://10.0.1.1:8181', 'https': 'https://10.0.1.1:8181'}

Upvotes: 1

taijamen
taijamen

Reputation: 61

Always check after using .split could have extra characters i fixed my project using

splitlines()

Upvotes: 1

Related Questions