Alice
Alice

Reputation: 485

How do I find logs specific to a single MapReduce job on Google App Engine?

The application I'm working on has a lot of mapreduce cron jobs running, and from time to time some of them produce errors (mosty ApplicationErrors, TransientErrors, DatabaseErrors, TimeOuts, etc), that are somewhat sporadic and for the most part don't bother me.

However, while debugging and testing, I find it's impossible to attribute which jobs caused which errors. The logs usually just give me the instance, but no hint even to the id of the job. The url is just the generic /mapreduce/worker_callback so no help there either.

I feel like I am missing something, or is there really no way of determining which log belongs to which MR pipeline, or the other way around - to find logs specific to a certain MR pipeline?

Upvotes: 1

Views: 415

Answers (1)

dlebech
dlebech

Reputation: 1839

In your log, you have task_name=appengine-mrshard-158112310423699B53FC1-22-0. The 158112310423699B53FC1 part corresponds to a specific Job ID. The details for this Job can usually be found at url-to-your-app/mapreduce. That way, you can find the name that you have given to the job.

View details for a job

To see the details for the specific Job ID (e.g. 158112310423699B53FC1):

appid.appspot.com/mapreduce/detail?mapreduce_id=158112310423699B53FC1

View entire pipeline

Finding the root Pipeline ID from the Job ID is possible using the following steps.

  1. Query the _AE_MR_MapreduceState table with the Job ID. Using the datastore viewer:

    SELECT * FROM _AE_MR_MapreduceState WHERE __key__ = Key('_AE_MR_MapreduceState','158112310423699B53FC1')
    

    The Pipeline ID can be found in the mapreduce_spec column as pipeline_id.

  2. The found Pipeline ID is probably not the root pipeline ID. To find the root Pipeline ID, query _AE_Pipeline_Record. Using the datastore viewer:

    SELECT * FROM _AE_Pipeline_Record WHERE __key__ = Key('_AE_Pipeline_Record', '653a3bd9a90f11e28ff6a3556e435fbc')
    

    The column root_pipeline has the key the root Pipeline ID for the MapReduce job.

  3. Finally, using the name of the root pipeline key, you can view the entire MapReduce pipeline here:

    appid.appspot.com/mapreduce/pipeline/status?root=0607a90aa90f11e2bbfea3556e435fbc

Upvotes: 1

Related Questions