richard rodrigues
richard rodrigues

Reputation: 71

gae mapreduce generator error no attribute validate_bucket_name

This is my first GAE project. I got my serial code to work on the dev_app (I am using the GoogleAppEngineLauncher on Mac). Since my code takes too long to finish I am trying to use mapreduce to speed up the process. I tried the following code but keep getting the following error. I am not sure if this is because of some error in my code or if I am missing any statements in the *yaml files. Kindly help!

class ShuffleDictPipeline(base_handler.PipelineBase):
  def run(self, *args, **kwargs):
    """ run """
    mapper_params = {
        "entity_kind": "coremic.RandomDict",
        "batch_size": 500,
        "filters": [("idx", "=", ndb_custom_key)]
    }
    reducer_params = {
        "mime_type": "text/plain"
    }
    output = yield mapreduce_pipeline.MapreducePipeline(
        "calc_shuff_core_microb",
        mapper_spec="coremic.shuffle_dict_coremic_map",
        mapper_params=mapper_params,
        reducer_spec="coremic.shuffle_dict_coremic_reduce", 
        reducer_params=reducer_params,
        input_reader_spec="mapreduce.input_readers.DatastoreInputReader", 
        output_writer_spec="mapreduce.output_writers.BlobstoreOutputWriter", 
       shards=16)

    yield StoreOutput(output)

Error:

ERROR    2016-03-05 20:03:21,706 pipeline.py:2432] 
Generator mapreduce.mapper_pipeline.MapperPipeline(*(u'calc_shuff_core_microb-map', u'coremic.shuffle_dict_coremic_map', u'mapreduce.input_readers.DatastoreInputReader'), **{'output_writer_spec': u'mapreduce.output_writers._GoogleCloudStorageKeyValueOutputWriter', 'params': {u'batch_size': 500, u'bucket_name': u'app_default_bucket', u'entity_kind': u'coremic.RandomDict',... (324 bytes))#b96dd511c0454fd99413d267b7388857 raised exception. AttributeError: 'NoneType' object has no attribute 'validate_bucket_name'

Traceback (most recent call last):
  File "/Users/rr/GAE/coremic/pipeline/pipeline.py", line 2156, in evaluate
self, pipeline_key, root_pipeline_key, caller_output)
  File "/Users/rr/GAE/coremic/pipeline/pipeline.py", line 1110, in _run_internal
    return self.run(*self.args, **self.kwargs)
  File "/Users/rr/GAE/coremic/mapreduce/mapper_pipeline.py", line 102, in run
queue_name=self.queue_name,
  File "/Users/rr/GAE/coremic/mapreduce/control.py", line 125, in start_map
in_xg_transaction=in_xg_transaction)
  File "/Users/rr/GAE/coremic/mapreduce/handlers.py", line 1730, in _start_map
mapper_output_writer_class.validate(mapper_spec)
  File "/Users/rr/GAE/coremic/mapreduce/output_writers.py", line 1075, in validate
return cls.WRITER_CLS.validate(mapper_spec)
  File "/Users/rr/GAE/coremic/mapreduce/output_writers.py", line 723, in validate
super(_GoogleCloudStorageOutputWriter, cls).validate(mapper_spec)
  File "/Users/rr/GAE/coremic/mapreduce/output_writers.py", line 604, in validate
cloudstorage.validate_bucket_name(
AttributeError: 'NoneType' object has no attribute 'validate_bucket_name'

Upvotes: 1

Views: 124

Answers (3)

Hernán Acosta
Hernán Acosta

Reputation: 695

Install GoogleAppEngineCloudStorageClient in your project.

output_writes.py does the following:

try:
  # Check if the full cloudstorage package exists. The stub part is in runtime.
  cloudstorage = None
  import cloudstorage
  if hasattr(cloudstorage, "_STUB"):
    cloudstorage = None
  # "if" is needed because apphosting/ext/datastore_admin:main_test fails.
  if cloudstorage:
    from cloudstorage import cloudstorage_api
    from cloudstorage import errors as cloud_errors
except ImportError:
  pass  # CloudStorage library not available

So, when importing cloudstorage fails, the value of cloudstorage variable = None. And that causes the exception later.

Upvotes: 0

richard rodrigues
richard rodrigues

Reputation: 71

I am still working on getting everything to work, but couple of things helped.

1.1 Install google cloud storage client lib on SDK to access the bucket. cloud google com appengine docs python googlecloudstorageclient

1.2 Set up (create) the bucket.

Then follow steps from https://plus.google.com/+EmlynORegan/posts/6NPaRKxMkf3
Note how the mapper params has changed.

2 - In mapreduce pipelines, replace "mapreduce.output_writers.BlobstoreOutputWriter" with "mapreduce.output_writers.GoogleCloudStorageConsistentOutputWriter"

3 - update reducer params to:

{ "mime_type": "text/plain", "output_writer": { "bucket_name": , "tmp_bucket_name": } }

Other very useful link:
https://gist.github.com/nlathia/ab670053ed460c4ca02f/89178e132b894fe5467c09164d3827f70e4ae2f8

Upvotes: 1

Jeffrey Godwyll
Jeffrey Godwyll

Reputation: 3893

You can do 1 of 2 things. Either

  1. Create a google cloud storage bucket associated with your project, because at the moment none is associated with it, hence the NoneType. Once done, you can add that to your mapper_params.

    mapper_params = {
        ...
        "bucket_name": "<your google cloud storage bucket name>",
        ...
    }
    

OR

  1. Create a default bucket by visiting your app engine's application settings in the cloud console https://console.cloud.google.com/appengine/settings?project=

    Create Default Bucket

Upvotes: 0

Related Questions