Run hive queries, and collect job information

Question

I would like to run a list of generated HIVE queries. For each, I would like to retrieve the MR job_id (or ids, in case of multiple stages). And then, with this job_id, collect statistics from job tracker (cumulative CPU, read bytes...)

How can I send HIVE queries from a bash or python script, and retrieve the job_id(s) ?

For the 2nd part (collecting stats for the job), we're using a MRv1 Hadoop cluster, so I don't have the AppMaster REST API. I'm about to collect data from the jobtracker web UI. Any better idea ?

gsps · Accepted Answer

you can get the list of jobs executed by running this command,

hadoop job -list all

then for each job-id, you can retrieve the stats, using the command, hadoop job -status job-id

And for associating the jobs with a query, you can get the job_name and match it with the query. something like this, How to get names of the currently running hadoop jobs?

hope this helps.

Run hive queries, and collect job information

Answers (1)

Related Questions