Istvan
Istvan

Reputation: 8572

Hive query with efficient join

I was wondering if the following query could be rewritten so that the first where clause produces a subset of the big table that can be joined to the small table and further filtered.

SELECT *
FROM big_table x
JOIN small_table y 
ON trim(x.ip_adress) = trim(y.ip_address)
WHERE eventdate = '2013-09-01'
AND unix_timestamp(cast(x.date AS TIMESTAMP)) - unix_timestamp(cast(y.date AS TIMESTAMP)) < 100 LIMIT 5 ;

Upvotes: 1

Views: 450

Answers (1)

Fabien TheSolution
Fabien TheSolution

Reputation: 5050

SELECT *
FROM ( SELECT *
       FROM big_table
       WHERE eventdate = '2013-09-01') x
JOIN small_table y ON trim(x.ip_adress) = trim(y.ip_address) AND 
                      unix_timestamp(cast(x.date AS TIMESTAMP)) - 
                          unix_timestamp(cast(y.date AS TIMESTAMP)) < 100
LIMIT 5;

Upvotes: 1

Related Questions