Reputation: 2441
This is a stripped down version of the script that causes continually increasing memory usage, I've seen it go past 600MB after 2 minutes:
import requests
import grequests
lines = (grequests.get(l.strip(), timeout=15) for l in open('links.txt') if len(l.strip()))
for r in grequests.imap(lines, size=20):
if r.ok:
print r.url
links.txt is a file containing a large number of urls, the problem happens with several large groups of urls that I have collected. It seems to me like that response objects may not be being deferenced?
I updated gevent, requests and grequests today, here are their versions:
In [2]: gevent.version_info
Out[2]: (1, 0, 0, 'beta', 3)
In [5]: requests.__version__
Out[5]: '0.13.5'
grequests doesn't have a version number that I could find.
Thanks in advance for any answers.
Upvotes: 5
Views: 1846
Reputation: 988
The project's requests library dependency should be updated.
Older versions of requests, including the one used in the question example, would not pre-fetch any response content by default, leaving it up to you to consume the data. This leaves open references to the underlying socket, so that even if the request session is garbage collected, the socket won't be garbage collected until the response goes out of scope or response.content
is called.
In later versions of requests, responses are pre-fetched by default and session connections are closed explicitly if the session was created ad hoc for fulfilling a module-level get
/post
/etc request such as those made by grequests when a session isn't passed in. This is covered in requests GitHub issue #520.
Upvotes: 0
Reputation: 613
This answer is just an alias and link back for people who might need this link.
I use the imap function and requests.Session to reduce the memory usage while making 380k requests in my scripts.
Upvotes: 1
Reputation: 532
From my point of view, it caused becouse you try to open all of the links at the same time. Try something like this:
links = set(links)
while links:
calls = (grequests.get(links.pop()) for x in range(200))
for r in calls:
...rest of your code
This code is not tested and you will find nicer soluution, this should be the proof that you simly try to open too many links at the same time and that causes your memory consumed.
Upvotes: 0