V_sqrt
V_sqrt

Reputation: 567

ConnectionError: how to handle this error?

I have got the following error:

ProtocolError: ('Connection aborted.', OSError(0, 'Error'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last) in

---> 16 df['List'] = df['Link'].apply(get_all_links)

/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 4106
else: 4107 values = self.astype(object)._values -> 4108 mapped = lib.map_infer(values, f, convert=convert_dtype) 4109 4110 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

in get_all_links(url) 7 # but that's outside the scope here 8 print(url) ----> 9 soup = BeautifulSoup(requests.get(url).content, "html.parser") 10 11 return [a.attrs.get('href', '') for a in soup.find_all('a')]

/anaconda3/lib/python3.7/site-packages/requests/api.py in get(url, params, **kwargs) 74 75 kwargs.setdefault('allow_redirects', True) ---> 76 return request('get', url, params=params, **kwargs) 77 78

ConnectionError: ('Connection aborted.', OSError(0, 'Error'))

I would say that a try/except condition might fix the issue. Does anyone know how to fix it?

def get_all_links(url):
    soup = BeautifulSoup(requests.get(url).content, "html.parser")

    return [a.attrs.get('href', '') for a in soup.find_all('a')]

df['List'] = df['Link'].apply(get_all_links)

I think all the information are shared. The website that seems causing the issue should be 'https://www.puppetstringnews.com/'.

An example of urls to test:

https://www.stackoverflow.com
https://deepclips.com/
https://www.puppetstringnews.com/

Upvotes: 2

Views: 3228

Answers (1)

TheEagle
TheEagle

Reputation: 5982

Did you try to open that website in the browser ? It first prompts you with an certificate error, and when you click "accept risk and continue", it gives an 404 Not Found error. The error probably is at the server side of that website, not in your code. To verify that, you can try another webpage. But with this one, you will never get the expected output. To skip such erroneous websites, wrap your get_all_links function body into a try / except Exception as e. Additionally, in the except clause, you should print("Error with url %s : %s" % (url, e)).

Upvotes: 1

Related Questions