Reputation: 313
I have a data set which I do multiple mappings on.
Assuming that I have 3 key-values pair for the reduce function, how do I modify the output such that I have 3 blobfiles - one for each of the key value pair?
Do let me know if I can clarify further.
Upvotes: 0
Views: 152
Reputation: 485
I don't think such functionality exists (yet?) in the GAE Mapreduce library.
Depending on the size of your dataset, and the type of output required, you can small-time-investment hack your way around it by co-opting the reducer as another output writer. For example, if one of the reducer outputs should go straight back to the datastore, and another output should go to a file, you could open a file yourself and write the outputs to it. Alternatively, you could serialize and explicitly store the intermediate map results to a temporary datastore using operation.db.Put
, and perform separate Map or Reduce jobs on that datastore. Of course, that will end up being more expensive than the first workaround.
In your specific key-value example, I'd suggest writing to a Google Cloud Storage File, and postprocessing it to split it into three files as required. That'll also give you more control over final file names.
Upvotes: 2