comparing records in two hive tables having same schema

Question

I've two tables in hive with exact schema. Both the tables have exact no of row count. I need to compare the individual column records between both the tables. If a particular record value is mismatched, the entire row should be thrown as output. The tables have approximately 358 columns and millions of records.

yahoo · Accepted Answer

This is what you can do:

Join both the tables using the unique key( i believe u must be having unique identifier in ur table) use the hash value of all the columns combined using hash function in hive to figure out the difference.query will look like this:

select * from tab1 a join tab2 b
using  a.id=b.id
where hash(a.col1,a.col2....)<>hash(b.col1,b.col2...);

comparing records in two hive tables having same schema

Answers (2)

Related Questions