Elodin
Elodin

Reputation: 648

Why is urllib.request so slow?

When I use urllib.request.decode to get the python dictionary from JSON format it takes far too long. However upon looking at the data, I realized that I don't even want all of it.

  1. Is there any way that I can only get some of the data, for example get the data from one of the keys of the JSON dictionary rather than all of them?
  2. Alternatively, if there was any faster way to get the data that could work as well?
  3. Or is it simply a problem with the connection and cannot be helped?
  4. Also is the problem with the urllib.request.urlopen or is it with the json.loads or with the .read().decode().

The main symptoms of the problem is either taking roughly 5 seconds when trying to receive information which is not even that much (less than 1 page of non-formatted dictionary). The other symptom is that as I try to receive more and more information, there is a point when I simply receive no response from the webpage at all!

The 2 lines which take up the most time are:

response = urllib.request.urlopen(url) # url is a string with the url
data = json.loads(response.read().decode())

For some context on what this is part of, I am using the Edamam Recipe API.

Help would be appreciated.

Upvotes: 2

Views: 3560

Answers (1)

bruno desthuilliers
bruno desthuilliers

Reputation: 77902

Is there any way that I can only get some of the data, for example get the data from one of the keys of the JSON dictionary rather than all of them?

You could try with a streaming json parser, but I don't think you're going to get any speedup from this.

Alternatively, if there was any faster way to get the data that could work as well?

If you have to retrieve a json document from an url and parse the json content, I fail to imagine what could be faster than sending an http request, reading the response content and parsing it.

Or is it simply a problem with the connection and cannot be helped?

Given the figures you mentions, the issue is very certainly in the networking part indeed, which means anything between your python process and the server's process. Note that this includes your whole system (proxy/firewall, your network card, your OS tcp/ip stack etc, and possibly some antivirus on window), your network itself, and of course the end server which may be slow or a bit overloaded at times or just deliberately throttling your requests to avoid overload.

Also is the problem with the urllib.request.urlopen or is it with the json.loads or with the .read().decode().

How can we know without timing it on your own machine ? But you can easily check this out, just time the various parts execution time and log them.

The other symptom is that as I try to receive more and more information, there is a point when I simply receive no response from the webpage at all!

cf above - if you're sending hundreds of requests in a row, the server might either throttle your requests to avoid overload (most API endpoints will behave tha way) or just plain be overloaded. Do you at least check the http response status code ? You may get 503 (server overloaded) or 429 (too many requests) responses.

Upvotes: 2

Related Questions