Macopare
Macopare

Reputation: 155

Minus query in HIVE

Minus query seems to not work in HIVE.

Tried ex:

select x from abc 
minus 
select x from bcd ; 

Am I doing this wrong or minus query isn't defined for HIVE? If so, is there any other way to get the result for this?

Upvotes: 9

Views: 78514

Answers (6)

Indent
Indent

Reputation: 4967

Since version 2.2.0 (release of 25 July 2017), Hive support the following operations :

  • intersect [distinct|all]
  • except [distinct|all]
  • minus [distinct|all]

HIVE-12765

For information, except and minus are equivalent. Some database accept only once, Hive accepts both.

By default (SQL Standard) the distinct clause is applied.

A good practice is to always be explicit with the all and distinct clauses. (same advice for union operation)

If you don't need to operate a costly distinct operation to eliminate duplicates rows, prefer the all clause which preserves all the rows.

Upvotes: 3

Marco Polo
Marco Polo

Reputation: 738

HQL DOES NOT support minus but you can always use Patrick Tucci solution which works fine when your select-list contains only a few fields. In my case I wanted to find the differences between an entire table (30+ fields) and a backup copy to find records that were different. Here is my solution:

select <all-my-fields>, count(*)
  from (
        select <all-my-fields> from mytable
        union all
        select <all-the-fields> from mybackuptable
       ) merged_data
group by <all-my-fields>
having count(*) = 1

Now this is not completly a "minus" since single records from mybackuptable would show up in the result which is what I wanted. To make it a complete "minus" equivalent I've added this:

select <all-my-fields>
  from (
        select max(source) source, <all-my-fields>, count(*)
          from (
                select 1 source, <all-my-fields> from mytable
                union all
                select 2 source, <all-the-fields> from mybackuptable
               ) merged_data
        group by <all-my-fields>
        having count(*) = 1
       ) minus_data
 where source = 1

Upvotes: 13

GMc
GMc

Reputation: 1774

The above query might be easier to read if it was formatted slightly differently:

SELECT col1
FROM table1
WHERE NOT EXISTS (
    SELECT col2
    FROM table2 
    WHERE table1.col1 = table2.col2
)

This is a correlated sub-query. That is, the outer query is correlated to the inner (sub) query via the join contained in the subquery that references the outer query.

After joining tbl1 to tbl2, the result of that join (i.e. in this case, the list of col2 values that exist both in table1.col1 and table2.col2) the NOT Exists is applied to remove those values from the list of col1 values in Table1. The result is the list of col1 values in table1 that do not appear in table2.

Any "extra" values in Table2 that do not exist in table1 do not play a part in this query as they are removed by the inner join. This is fine, because the query is trying to return just those values in table1 that are not in table2 - extra values in table2 aren't in table1 in the first place - hence they are irrelevant.

Upvotes: 2

Rup Sen
Rup Sen

Reputation: 1

NOT EXISTS is supported in HIVE QL.

Example:

SELECT col1 FROM table1 WHERE NOT EXISTS SELECT col2 FROM table2 
WHERE table1.col1 = table2.col2

Upvotes: 0

Patrick Tucci
Patrick Tucci

Reputation: 1952

It does not appear that HQL supports the MINUS operator. See this relevant, albeit a bit old, resource:

http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL

What you want to do can be done with a LEFT JOIN or NOT EXISTS:

SELECT x
FROM abc
LEFT JOIN bcd
ON abc.x = bcd.x
WHERE bcd.x IS NULL

EDIT: Per comments below, NOT EXISTS is not supported.

SELECT x 
FROM abc
WHERE NOT EXISTS (SELECT x FROM bcd)

Upvotes: 23

brandon.bell
brandon.bell

Reputation: 1411

At this time, Hive does not support the MINUS functionality. See http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL

Upvotes: 0

Related Questions