hlagos
hlagos

Reputation: 7957

Hive - compare data between tables

I need to compare data between 2 external tables (A and A') with the same columns and i actually make some sql scripts to do this but is not enough efficient and robust. there are a way to do this with UDAF or external tools?

Here one sample:

Table A

+----+----+----+
| C1 | C2 | C3 |
+----+----+----+
| 01 | A  | 02 |
| 02 | B  | 03 |
| 03 | C  | 04 |
+----+----+----+

Table A'

+----+----+----+
| C1 | C2 | C3 |
+----+----+----+
| 01 | B  | 02 |
| 02 | B  | 03 |
| 04 | D  | 04 |
+----+----+----+

Output

+------+------+------+------+------+------+
|  C1  |  C2  |  C3  | C1'  | C2'  | C3'  |
+------+------+------+------+------+------+
| 01   | A    | 02   | 01   | B    | 02   |
| 03   | C    | 04   | null | null | null |
| null | null | null | 04   | D    | 04   |
+------+------+------+------+------+------+

Upvotes: 0

Views: 2421

Answers (3)

prashanth
prashanth

Reputation: 445

There is a new tool to compare data. DiffTool (https://dataq.io/) is a UI based data comparison tool that can be used across data sources like RDBMS, Hadoop(Hive) or text files (CSV & JSON). DiffTool Compares datasets based on key(s), and some of the features include:

controlling data volume using custom filters, Transforming columns using SQL expressions, The ability to scale using distributed architecture, And analyzing results intuitively with rich visualization

Upvotes: 1

Venkat
Venkat

Reputation: 1810

You could do this as a UDF if the comparison id b/w rows from table A and table A' based on a join condition which in this case i believe is the first column C1.

Approach:

select UDF_FUNCTION(a.c1,a.c2.a.c3, A'.c2,a'.c3) from A JOIN A' ON (A.c1=a'.c1)

In the UDF you can compare the values as per your requirement and return an array or delimited text.

Also what do you mean by

some sql scripts to do this but is not enough efficient and robust

Upvotes: 0

Syed Abdul Kather
Syed Abdul Kather

Reputation: 618

You have to write the custom MR Program for this. We have already had these kind of problems. You have to write a custom MR Program are custom UDF. There is no in build method to solve this problem.

Upvotes: 0

Related Questions