Leo Romanovsky
Leo Romanovsky

Reputation: 1790

Google python map-reduce library returning a NULL context

In my mapper, the context is being returned as NULL

class DeleteOldObservationsMapper(object):
  """Mapper for deleting old observations."""

  def __init__(self):
      logging.info('DeleteOldObservationsMapper init')
      ctx = mapreduce.context.get()
      when = ctx.mapreduce_spec.mapper.params.get('before_timestamp_seconds')
      assert when
      self.before_datetime = datetime.datetime.utcfromtimestamp(when)
      logging.info('before_datetime %s', self.before_datetime)

Here is the error trace:

ERROR    2013-05-24 16:03:38,662 webapp2.py:1552] 'NoneType' object has no attribute 
'mapreduce_spec'
Traceback (most recent call last):
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/Users/leoromanovsky/code/adapt/server/mapreduce/base_handler.py", line 66, in post
    self.handle()
  File "/Users/leoromanovsky/code/adapt/server/mapreduce/handlers.py", line 320, in handle
    tstate = model.TransientShardState.from_request(self.request)
  File "/Users/leoromanovsky/code/adapt/server/mapreduce/model.py", line 993, in from_request
    handler = mapreduce_spec.mapper.handler
  File "/Users/leoromanovsky/code/adapt/server/mapreduce/model.py", line 618, in get_handler
    return util.handler_for_name(self.handler_spec)
  File "/Users/leoromanovsky/code/adapt/server/mapreduce/util.py", line 149, in handler_for_name
    return getattr(resolved_name.im_class(), resolved_name.__name__)
  File "/Users/leoromanovsky/code/adapt/server/jobs.py", line 22, in __init__
    when = ctx.mapreduce_spec.mapper.params.get('before_timestamp_seconds')
AttributeError: 'NoneType' object has no attribute 'mapreduce_spec'

Upvotes: 3

Views: 410

Answers (2)

Matt Faus
Matt Faus

Reputation: 6671

We recently hit this problem with a threadsafe: false project, and were able to resolve the issue by changing how were were importing the context module. Kevin explains the issue well in this bug report.

[email protected] Maybe this will help someone else, but I was seeing this issue and just figured it out for my use case. In our project, we have the mapreduce library in subfolders "libs/external/mapreduce" and not the root of our project.

The library imports context (among other things) from itself as from mapreduce import context. So to make it work we are using import manipulation like so:

import os, sys 
sys.path.append(os.path.join(os.path.dirname(__file__), 'libs/external'))

However, in a few places in our code we were still importing context like so:

from libs.external.mapreduce import context

This will actually cause context to get imported twice and mapreduce.context.Context._local will get set twice, thus having two different instances of thread._local. When Context._set is called to store the context for later retrieval in a mapper function or something, it will be using the original thread._local instance.

Later when our mapper module imports context again, it gets a new thread._local instance. When we were then calling context.get() it was returning the new instance which didn't actually have the context:

AttributeError: 'NoneType' object has no attribute 'mapreduce_spec'

Switching all our imports to from mapreduce import context fixed it for us.

Upvotes: 2

Robsdedude
Robsdedude

Reputation: 1403

I think this is the answer to your problem:

http://code.google.com/p/appengine-mapreduce/issues/detail?id=127

mapreduce.context.get() is simply not threadsave...

So what you could do is to pack it into a wrapper which makes it threadsave then by using a Lock mechanism.

Upvotes: 1

Related Questions