oscar salgado
oscar salgado

Reputation: 87

Python 3: request module - How to break session and continue?

I am writing a web scraping program, and I am facing the following problem. When my URL is *.doc or *.jpg, etc., my current request does not timeout and does not get passed to the next URL.

For example:

html = requests.get("http//:www.someweb.com/abcd.doc", verify=False, timeout=5)

can you help me on how to get() the url but some for like 10 seconds and then move to next one??

I have also tried with eventled:

import eventlet
eventlet.monkey_patch()
with eventlet.Timeout(10):
    html = requests.get(enlance, verify=False)

and i received a bunch of errors as follows:

Traceback (most recent call last): File "emailCrawler.py", line 69, in getLinks("") File "emailCrawler.py", line 64, in getLinks getLinks(page) File "emailCrawler.py", line 64, in getLinks getLinks(page) File "emailCrawler.py", line 64, in getLinks getLinks(page) File "emailCrawler.py", line 64, in getLinks getLinks(page) File "emailCrawler.py", line 64, in getLinks getLinks(page) File "emailCrawler.py", line 25, in getLinks html = requests.get(enlance, verify=False) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/requests/api.py", line 72, in get return request('get', url, params=params, **kwargs) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/requests/api.py", line 58, in request return session.request(method=method, url=url, **kwargs) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/requests/adapters.py", line 440, in send timeout=timeout File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.5/http/client.py", line 1107, in request self._send_request(method, url, body, headers) File "/usr/lib/python3.5/http/client.py", line 1152, in _send_request self.endheaders(body) File "/usr/lib/python3.5/http/client.py", line 1103, in endheaders self._send_output(message_body) File "/usr/lib/python3.5/http/client.py", line 934, in _send_output self.send(msg) File "/usr/lib/python3.5/http/client.py", line 877, in send self.connect() File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/urllib3/connection.py", line 166, in connect conn = self._new_conn() File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/urllib3/connection.py", line 141, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/eventlet/greenio/base.py", line 247, in connect self._trampoline(fd, write=True) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/eventlet/greenio/base.py", line 207, in _trampoline mark_as_closed=self._mark_as_closed) File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/eventlet/hubs/init.py", line 163, in trampoline return hub.switch() File "/home/ccnp/environments/my_env/lib/python3.5/site-packages/eventlet/hubs/hub.py", line 295, in switch return self.greenlet.switch() eventlet.timeout.Timeout: 10 seconds

Upvotes: 0

Views: 1688

Answers (1)

oscar salgado
oscar salgado

Reputation: 87

i think i have found the solution

instead of eventlet the request i eventlet the beautifulsoup like this

try:
    html = requests.get(enlance, verify=False, timeout=5)

except Exception as e:
    print(e)
else:
    with eventlet.Timeout(5):
        bsObj = BeautifulSoup(html.text, "html.parser", from_encoding="iso-8859-1")

Upvotes: 0

Related Questions