Sara
Sara

Reputation: 322

Compare one value of column A with all the values of column B in Hive HQL

I have two columns in one table say Column A and Column B. I need to search each value of Column A with All the values of column B each and every time and return true if the column A value is found in any of the rows of column B. How can i get this?

I have tried using the below command:

select column _A, column_B,(if (column_A =column_B), True, False) as test from sample;

If i use the above command, it is checking for that particular row alone. But I need true value, if a value of column A is found in any of the rows of column B. How can i can check one value of column A with all the all the values of column B?

Or Is there any possibility to iterate and compare each value between two columns?

Upvotes: 1

Views: 3040

Answers (2)

sujit
sujit

Reputation: 2328

Solution

create temporary table t as select rand() as id, column_A, column_B from sample; --> Refer 1

select distinct t3.id,t3.column_A,t3.column_B,t3.match from ( --> Refer 3

select t1.id as id, t1.column_A as column_A, t1.column_B as column_B,--> Refer 2 
if(t2.column_B is null, False, True) as match  from t t1 LEFT OUTER JOIN
   t t2  ON t1.column_A = t2.column_B 

) t3;

Explanation

  1. Create an identifier column to keep track of the rows in original table. I am using rand() here. We will take advantage of this to get the original rows in Step 3. Creating a temporary table t here for simplicity in next steps.
  2. Use a LEFT OUTER JOIN with self to do your test that requires matching each column with another across all rows, yielding the match column. Note that here multiple duplicate rows may get created than in Sample table, but we have got a handle on the duplicates, since the id column for them will be same.

  3. In this step, we apply distinct to get the original rows as in Sample table. You can then ditch the id column.

Notes

  1. Self joins are costly in terms of performance, but this is unavoidable for solution to the question.
  2. The distinct used in Step 3, is costly too. A more performant approach would be to use Window functions where we can partition by the id and pick the first row in the window. You can explore that.

Upvotes: 1

jose_bacoy
jose_bacoy

Reputation: 12684

You can do a left join to itself and check if the column key is null. If it is null, then that value is not found in the other table. Use if or "case when" function to check if it is null or not.

Select t1.column_A,
          t1.column_B,
         IF(t2.column_B is null, 'False', 'True') as test
from Sample t1
Left Join Sample t2
On t1.column_A = t2.column_B;

Upvotes: 0

Related Questions