Shariq
Shariq

Reputation: 249

Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;

select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")

Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!

To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.

Can this be achieved this using hive ?!

Upvotes: 0

Views: 12050

Answers (3)

Tutu Kumari
Tutu Kumari

Reputation: 503

It will work if you put in :

select * from TableA where TA_timestamp in (select timestmp from TableB where id="hourDim")

  • EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.

Upvotes: 0

agentv
agentv

Reputation: 779

Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.

Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.

Upvotes: 0

YABADABADOU
YABADABADOU

Reputation: 1328

Subqueries inside a WHERE clause are not supported in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

However, often you can use a JOIN statement instead to get to the same result: https://karmasphere.com/hive-queries-on-table-data#join_syntax

For example, this query:

   SELECT a.KEY, a.value
   FROM a
   WHERE a.KEY IN
   (SELECT b.KEY FROM B);

can be rewritten to:

   SELECT a.KEY, a.val
   FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)

Upvotes: 5

Related Questions