user2531569
user2531569

Reputation: 619

Compare two tables of data in HIVE

I have to find out if data in both the tables is same for a given view_date. If same my SQL should return zero, else non zero.

Table1/Table2 columns:

Source
view_date
count
start_date
end_date

I tried in the below way:

SELECT *
FROM (
    SELECT count(*)
    FROM table1
    ) a
JOIN (
    SELECT count(*)
    FROM TABLE 2
    ) b
WHERE view_date = '05/08/2016'
    AND a.x != b.y;

But I am not getting the expected result. Could someone please help me?

Upvotes: 0

Views: 4065

Answers (2)

Sourygna
Sourygna

Reputation: 719

To do a full comparison of 2 tables, you not only need to make sure that the number of rows match, but you must check that all the data in all the columns for all the rows match!

This can be a complicated problem (when I worked at Hortonworks, for 1 project we developed 3 different programs to try to solve this). Lately I had the opportunity to develop a program that solves this in an elegant and efficient way: https://github.com/bolcom/hive_compared_bq

The program shows you the differences in a webpage (which is something you could skip if you don't need it) and also gives you a return value 0/1 which is what you currently want.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Here is one method that counts the number of rows that are unique in each table:

select count(*)
from (select source, count, start_date, end_date,
             min(which) as minwhich, max(which) as maxwhich
      from ((select source, count, start_date, end_date, 1 as which
             from table1 
             where viewdate = '2016-06-08'
            ) union all
            (select source, count, start_date, end_date, 2 as which
             from table2
             where viewdate = '2016-06-08'
            )
           ) t12
      group by source, count, start_date, end_date
      having minwhich = maxwhich
     ) t;

Note: If rows are duplicated across all values in a table, this does not check that the same number of duplicates are in each table.

Upvotes: 1

Related Questions