mathieu
mathieu

Reputation: 2428

Run hive queries, and collect job information

I would like to run a list of generated HIVE queries. For each, I would like to retrieve the MR job_id (or ids, in case of multiple stages). And then, with this job_id, collect statistics from job tracker (cumulative CPU, read bytes...)

How can I send HIVE queries from a bash or python script, and retrieve the job_id(s) ?

For the 2nd part (collecting stats for the job), we're using a MRv1 Hadoop cluster, so I don't have the AppMaster REST API. I'm about to collect data from the jobtracker web UI. Any better idea ?

Upvotes: 3

Views: 15737

Answers (1)

gsps
gsps

Reputation: 86

you can get the list of jobs executed by running this command,

hadoop job -list all

then for each job-id, you can retrieve the stats, using the command, hadoop job -status job-id

And for associating the jobs with a query, you can get the job_name and match it with the query. something like this, How to get names of the currently running hadoop jobs?

hope this helps.

Upvotes: 3

Related Questions