Reputation: 1107
I have the following code to loop over all entities of kind RawEmailModel and update counters:
def update_emails(cursor=None, stats = {}):
BATCH_SIZE = 100
if not cursor:
# Start of the job
pass
next_cursor = cursor
more = True
try:
while more:
rawEmails, next_cursor, more = RawEmailModel.query().fetch_page(BATCH_SIZE, start_cursor=next_cursor)
ndb.get_context().clear_cache()
for rawEmail in rawEmails:
try:
stats[rawEmail.userId] += 1
except Exception:
stats[rawEmail.userId] = 0
logging.debug(stats)
logging.debug("Done counting")
except Exception as e:
logging.error(e)
I am clearing the ndb cache based on what I read in https://stackoverflow.com/a/12108891/2448805 However, I still get errors saying I'm running out of memory:
20:21:55.240 {u'104211720960924551911': 45622, u'105605183894399744988': 0, u'114651439835375426353': 2, u'112308898027744263560': 667, u'112185522275060884315': 804}
F 20:22:01.389 Exceeded soft private memory limit of 128 MB with 153 MB after servicing 14 requests total
W 20:22:01.390 While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
I don't get why I'm still running out of memory when I keep clearing the cache on top of the loop? Thanks!
Upvotes: 1
Views: 519
Reputation: 3859
Looks like you have a large number of RawEmailModel entries and your stats dict is growing and hitting the memory limit. Your ndb.get_context().clear_cache() is not going to help you here.
You may have to come up with another Model to hold the counts say RawEmailCounterModel with userId and total_count as fields and keep updating it from while loop instead of using your stats dict to do the counting.
At least this will help you with the out of memory issue. But this may not be performant.
Upvotes: 1