jimfawkes
jimfawkes

Reputation: 375

Why is python requests not terminating and why are these seperate logs printed?

I am running a job which makes many requests to retrieve data from an API. In order to make the requests, I am using the requests module and an iteration over this code:

logger.debug("Some log message")
response = requests.get(
    url=self._url,
    headers=self.headers,
    auth=self.auth,
)
logger.debug("Some other log message")

This usually produces the following logs:

[...] Some log message
[2019-08-27 03:00:57,201 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 401 0
[2019-08-27 03:00:57,601 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 200 951999
[...] Some other log message

In very rare occasions however, the job never terminates and in the logs it says:

[...] Some log message
[2019-08-27 03:00:57,201 - DEBUG - connectionpool.py:393] https://my.url.com:port "GET /some/important/endpoint?$skiptoken='12345' HTTP/1.1" 401 0

It never printed the remaining log messages and never returned. I am not able to reproduce the issue. I made the request which never returned manually but it gave me the desired response.

Questions:

  1. Why does urllib3 always print a log with a status code 401 before printing a log with status code 200? Is this always the case or caused by an issue with the authentication or with the API server?

  2. In the rare case of the second log snipped, is my assumption correct, that the application is stuck making a request which never returns? Or:

    a) Could the requests.get throw an exception which results in the other log statements to never be printed and then is "magically" get caught somewhere in my code?

    b) Is there a different possibility which I have not realised?


Additional Information:


Update

Answer to Q1:

This seems to be expected behaviour for HTTP Digest Auth. See this github issue and Wikipedia.

Upvotes: 0

Views: 868

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

As Vithulan already answered, you should always set a timeout value when doing network calls - unless you don't care about your process staying stuck forever that is...

Now wrt/ error handling etc:

a) Could the requests.get throw an exception which results in the other log statements to never be printed and then is "magically" get caught somewhere in my code?

It's indeed possible that some other try/except block upper in the call stack swallows the exception, but only you can tell. Note that if that's the case, you have some very ill-behaved code - a try/except should 1/ only target the exact exception(s) it's supposed to handle, 2/ have the least possible code within the try block to avoid catching similar errors from another part of the code and 3/ never silence an exception (IOW it should at least log the exception and traceback).

Note that you might as well just have a deactivated logger FWIW ;-)

This being said and until you made sure you didn't have such an issue, you can still get more debugging info by logging requests exceptions in your function:

logger.debug("Some log message")
try:
    response = requests.get(
        url=self._url,
        headers=self.headers,
        auth=self.auth,
        timeout=SOME_TIMEOUT_VALUE   
  )
except Exception as e:
    # this will log the full traceback too
    logger.exception("oops, call to %s failed : %s", self._url, e)
    # make sure we don't swallow the exception
    raise

logger.debug("Some other log message")

Now a fact of life is that HTTP requests can fail for such an awful lot of reasons that you should actually expect it to fail, so you may want to have some retry mechanism. Also, the fact that the call to requests.get didn't raise doesn't mean the call failed - you still have to check the response code (or use response.raise_for_status()).

EDIT:

As mentioned in my question, my code has no try/except block because we would like the entire job to terminate if any problem occurs.

A try/except block doesn't prevent you from terminating the job - just re-raise the exception (eventually after X retries), or raise a new one instead, or call sys.exit() (which actually works by raising an exception) -, and it lets you get useful debugging infos etc, cf my example code.

If there is an issue with the logger, this would then only occur in rare occasions. I can not imagine a scenario where the same code is run but sometimes the loggers are activated and sometimes not.

I was talking about another logger upper in the call stack. But this was only for completeness, I really think you just have a request that never returns for the lack of a timeout.

Do you know why I am noticing the Issue I talk about in Question 1?

Nope, and that's actually something I'd immediatly investigate since, AFAICT, for a same request, you should either have only the 401 or only the 200.

According to the RFC:

10.4.2 401 Unauthorized

The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.47) containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field (section 14.8).

If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information.

So unless requests does something weird with auth headers (which is not the fact as far as I remember but...), you should only have one single response logged.

EDIT 2:

I wanted to say that if an exception is thrown but not explicitly caught by my code, it should terminate the job (which was the case in some tests I ran)

If an exception reaches the top of the call stack without being handled, the runtime will indeed terminate the process - but you have to be sure that no handler up the call stack kicks in and swallows the exception. Testing the function in isolation will not exhibit this issue, so you'd have to check the full call stack.

This being said:

The fact, that it does not terminate, suggests to me, that no exception is thrown.

That's indeed the most likely, but only you can make sure it's really the case (we don't know the full code, the logger configuration etc).

Upvotes: 1

Vithulan
Vithulan

Reputation: 300

To answer your question, 1. Seems like its a problem from your API. To make sure can you run a curl command and see?

curl -i https://my.url.com:port/some/important/endpoint?$skiptoken='12345'
  1. It never terminates, maybe due to that API is not responding back. Add timeout to avoid this kind of blocks.

    response = requests.get( url=self._url, headers=self.headers, auth=self.auth, timeout=60 )

Hope this helps your problem.

Upvotes: 2

Related Questions