nLee
nLee

Reputation: 1385

Comparing Similar Hive Tables

I have two hive tables (t1 and t2) that I would like to compare. The second table has 5 additional columns that are not in the first table. Other than the five disjoint fields, the two tables should be identical. I am trying to write a query to check this. Here is what I have so far:

SELECT * FROM t1 
UNION ALL
select * from t2
GROUP BY some_value
HAVING count(*) == 2

If the tables are identical, this should return 0 records. However, since the second table contains 5 extra fields, I need to change the second select statement to reflect this. There are almost 60 column names so I would really hate to write it like this:

SELECT * FROM t1 
UNION ALL
select field1, field2, field3,...,fieldn from t2
GROUP BY some_value
HAVING count(*) == 2

I have looked around and I know there is no select * EXCEPT syntax, but is there a way to do this query without having to explicity name each column that I want included in the final result?

Upvotes: 1

Views: 1119

Answers (2)

Sourygna
Sourygna

Reputation: 719

You could have a look at this Python program that handles such comparisons of Hive tables (comparing all the rows and all the columns), and would show you in a webpage the differences that might appear: https://github.com/bolcom/hive_compared_bq

To skip the 5 extra fields, you could use the "--ignore-columns" option.

Upvotes: 0

shranta rout
shranta rout

Reputation: 1

You should have used UNION DISTINCT for the logic you are applying.

However, the number and names of columns returned by each select_statement have to be the same otherwise a schema error is thrown.

Upvotes: 0

Related Questions