Hive + Tez :: A join query stuck at last 2 mappers for a long time

Question

I have a views table joining with a temp table with the below parameters intentionally enabled.

hive.auto.convert.join=true;    
hive.execution.engine=tez;

The Code Snippet is,

CREATE TABLE STG_CONVERSION AS    
SELECT CONV.CONVERSION_ID,
       CONV.USER_ID,
       TP.TIME,
       CONV.TIME AS ACTIVITY_TIME,
       TP.MULTI_DIM_ID,
       CONV.CONV_TYPE_ID,
       TP.SV1
FROM VIEWS TP
JOIN  SCU_TMP CONV ON TP.USER_ID = CONV.USER_ID
WHERE TP.TIME <= CONV.TIME;

In the normal scenario, both the tables can have any number of records.
However,in the SCU_TMP table, only 10-50 records are expected with the same User Id.

But in some cases, couple of User IDs come with around 10k-20k records in SCU Temp table, which creates a cross product effect.
In such cases, it'll run for ever with just 1 mapper to complete.

Is there any way to optimise this and run this gracefully?

Hive + Tez :: A join query stuck at last 2 mappers for a long time

Answers (1)

Related Questions