Quest Monger
Quest Monger

Reputation: 8652

How to query a restful webservice using Python

Writing a Python script that uses Requests lib to fire off a request to a remote webservice. Here is my code (test.py):

import logging.config
from requests import Request, Session

logging.config.fileConfig('../../resources/logging.conf')
logr = logging.getLogger('pyLog')
url = 'https://158.74.36.11:7443/hqu/hqapi1/user/get.hqu'
token01 = 'hqstatus_python'
token02 = 'ytJFRyV7g'
response_length = 351

def main():
    try:
        logr.info('start SO example')

        s = Session()
        prepped = Request('GET', url, auth=(token01, token02), params={'name': token01}).prepare()
        response = s.send(prepped, stream=True, verify=False)

        logr.info('status: ' + str(response.status_code))
        logr.info('elapsed: ' + str(response.elapsed))
        logr.info('headers: ' + str(response.headers))
        logr.info('content: ' + response.raw.read(response_length).decode())


    except Exception: 
        logr.exception("Exception")
    finally:
        logr.info('stop')


if __name__ == '__main__':
    main()

I get the following successful output when i run this:

INFO test - start SO example
INFO test - status: 200
INFO test - elapsed: 0:00:00.532053
INFO test - headers: CaseInsensitiveDict({'server': 'Apache-Coyote/1.1', 'set-cookie': 'JSESSIONID=8F87A69FB2B92F3ADB7F8A73E587A10C; Path=/; Secure; HttpOnly', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Wed, 18 Sep 2013 06:34:28 GMT'})
INFO test - content: <?xml version="1.0" encoding="utf-8"?>
<UserResponse><Status>Success</Status> .... </UserResponse>
INFO test - stop

As you can see, there is this weird variable 'response_length' that i need to pass to the response object (optional argument) to be able to read the content. This variable has to be set to a numeric value that is equal to length of the 'content'. This obviously means that i need to know the response-content-length before hand, which is unreasonable.

If i don't pass that variable or set it to a value greater than the content length, I get the following error:

Traceback (most recent call last):
  File "\Python33\lib\http\client.py", line 590, in _readall_chunked
    chunk_left = self._read_next_chunk_size()
  File "\Python33\lib\http\client.py", line 562, in _read_next_chunk_size
    return int(line, 16)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 0: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 22, in main
    logr.info('content: ' + response.raw.read().decode())
  File "\Python33\lib\site-packages\requests\packages\urllib3\response.py", line 167, in read
    data = self._fp.read()
  File "\Python33\lib\http\client.py", line 509, in read
    return self._readall_chunked()
  File "\Python33\lib\http\client.py", line 594, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(351 bytes read)

How do i make this work without this 'response_length' variable? Also, are there any better options than 'Requests' lib?

PS: this code is an independent script, and does not run in the Django framework.

Upvotes: 1

Views: 6517

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122332

Use the public API instead of internals and leave worrying about content length and reading to the library:

import requests

s = requests.Session()
s.verify = False
s.auth = (token01, token02)
resp = s.get(url, params={'name': token01}, stream=True)
content = resp.content

or, since stream=True, you can use the resp.raw file object:

for line in resp.iter_lines():
    # process a line

or

for chunk in resp.iter_content():
    # process a chunk

If you must have a file-like object, then resp.raw can be used (provided stream=True is set on the request, like done above), but then just use .read() calls without a length to read to EOF.

If you are however, not querying a resource that requires you to stream (anything but a large file request, a requirement to test headers first, or a web service that is explicitly documented as a streaming service), just leave off the stream=True and use resp.content or resp.text for byte or unicode response data.

In the end, however, it appears your server is sending chunked responses that are malformed or incomplete; a chunked transfer encoding includes length information for each chunk and the server appears to be lying about a chunk length or sending too little data for a given chunk. The decode error is merely the result of incomplete data having been sent.

Upvotes: 4

hago
hago

Reputation: 1710

The server you request use "chunked" transfer encoding so there is not a content-length header. A raw response in chunked transfer encoding contains not only actual content but also chunks, a chunk is a number in hex followed by "\r\n" and it always cause xml or json parser error.
try use:

response.raw.read(decode_content=True)

Upvotes: 1

Related Questions