Ironman
Ironman

Reputation: 1410

Find if a record exists in Hive table

I want to insert a record in hive table but before inserting it i want to check if it that record already exists in the table. that case i won't insert the record and would rather skip it.

Hive version i am using is Hive 1.1.0.

Table structure:

      hk                | diagnosisid  |    id   | cdcode
-----------------------------------------------------------
EXVWLOORV@#19690321F    |       1      |   7810  |    I9C
OHQQDUG@#19380630F      |       1      |   3643  |    I9C
VPLWK@#19610120F        |       1      |   2731  |    I9C

before inserting the record i want to check if the hk value is already present or not in the hive table.

Upvotes: 0

Views: 3211

Answers (1)

user8406805
user8406805

Reputation:

Approach:::1:

If you can write the "IF and LOOP" then you can try this approach.

select count(1) as cnt
from yourtable
where hk='new_value';

if (cnt=0){
---your insert query will be here
}else {
   --some message... "hk" already present in table.
}

Approach:::2:

INSERT INTO your_table_name
      (hk, diagnosisid, id, cdcode)
SELECT 'asdf' as hk,
       'asdf' as diagnosisid,
       'ss'   as id,
       'ddww' as cdcode
 WHERE 'asdf' 
        NOT IN (SELECT hk FROM your_table_name)

Approach:::3:

Then finally you have to write ETL script to perform the actions.

Solution:

In your scenario: (as per table provided in comments) Query will be:

create table pid1(hk string,diagnosisid string,id string,cdcode string);
insert into pid1 values('EXVWLOORV@#19690321F','1','7810','I9C');
insert into pid1 values('OHQQDUG@#19380630F','1','3643','I9C');
insert into pid1 values('VPLWK@#19610120F','1','2731','I9C');

create table pid as select * from pid1 limit 1;

INSERT INTO pid
      (hk, diagnosisid, id, cdcode)
SELECT hk,
       diagnosisid,
       id,
       cdcode
from pid1
 WHERE hk NOT IN (SELECT hk FROM pid);

All the best!!!

Upvotes: 1

Related Questions