Kev
Kev

Reputation: 313

GAE MapReduce, How to write Multiple Outputs

I have a data set which I do multiple mappings on.

Assuming that I have 3 key-values pair for the reduce function, how do I modify the output such that I have 3 blobfiles - one for each of the key value pair?

Do let me know if I can clarify further.

Upvotes: 0

Views: 152

Answers (1)

Alice
Alice

Reputation: 485

I don't think such functionality exists (yet?) in the GAE Mapreduce library.

Depending on the size of your dataset, and the type of output required, you can small-time-investment hack your way around it by co-opting the reducer as another output writer. For example, if one of the reducer outputs should go straight back to the datastore, and another output should go to a file, you could open a file yourself and write the outputs to it. Alternatively, you could serialize and explicitly store the intermediate map results to a temporary datastore using operation.db.Put, and perform separate Map or Reduce jobs on that datastore. Of course, that will end up being more expensive than the first workaround.

In your specific key-value example, I'd suggest writing to a Google Cloud Storage File, and postprocessing it to split it into three files as required. That'll also give you more control over final file names.

Upvotes: 2

Related Questions