Vijiy
Vijiy

Reputation: 1197

How to Identify total number of jobs required to execute hive query

Is there a way to identify, total number of jobs required to execute a query.

For Example in the below 2 queries, number of joins and subquery are same but one query would require 2 jobs where as other requires 3

 select t1.item_dim_key hive, t2.item_dim_key as monet 
   from ext_dist_it_dim_key t1 
        left outer join (select distinct item_dim_key from PO_ITEM_DIM) t2 on t1.item_dim_key=t2.item_dim_key 
   where t2.item_dim_key is null;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = 20190208020329_258ee4c0-5819-4842-b479-d549c82a0779
**Total jobs = 3**

hive> select t1.item_dim_key hive, t2.item_dim_key as monet 
       from (select distinct item_dim_key from PO_ITEM_DIM) t1 
            left outer join ext_dist_it_dim_key t2 on t1.item_dim_key=t2.item_dim_key 
      where t2.item_dim_key is null;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = 20190208020624_9ea3dc20-ffc8-4461-9516-7a4770d1dd6b
**Total jobs = 2**

Is it possible to know how many jobs would it take for a query to execute? What are the parameters required to calculate number of jobs.

Thanks

Upvotes: 0

Views: 424

Answers (1)

leftjoin
leftjoin

Reputation: 38290

Use EXPLAIN, it shows the query execution plan. Only plan can help to answer this question for sure. Based on statistics or table(files) size, Optimizer can convert some joins to map-joins, etc.

Upvotes: 1

Related Questions