E. Diaz
E. Diaz

Reputation: 145

Merging 2 tables ignoring duplicates

I'm making a dictionary of names with gender asociated so I have a main table lets say:

**name_dict a**
name   gender
=======================
jhon   male
jane   female
anna   female

and a source data table, which has "duplicates", I mean, the same name, with different gender like this:

**name_source b**
name      gender
=======================
cameron   male
cameron   female
anna      female
travis    male

I would like to merge this two tables with this conditions

  1. ignoring anna (done in the merge condition a.name=b.name)
  2. ignoring the cameron entries (this is where I'm stuck)

How would I create my merge so I get this result?

name      gender
----------------
jhon      male
jane      female
anna      female
travis    male

I really appreciate all your help and suggestions!

Edit------------------------------------------------------------------------ so, here is what i created with my inspiration

merge into name_dictionary x using(
    select a.name,a.gender from name_source a, (select name,count(1) from name_source group by name having count(1)>1 order by count(1)) b
    where a.name=b.name
    ) y
    on (x.name=y.name)
    when not matched then
    insert (name,gender)
    values (y.name,y.gender)

Then i said, lets test it against our friend Thomas Tschernich's sugestion, for that i used:

insert into name_dictionary
    select name,gender
    from name_source t1
    where
        (t1.name, t1.gender) not in (
            select name, gender from name_dictionary
        )
        and
        (t1.name, t1.gender) not in (
            select t2.name, t2.gender
            from name_source t2
            join name_source t3 on (t2.name = t3.name and t2.gender != t3.gender)
        );

then runned both against each other and got:

QUERY      EXEC TIME    FINAL ROWS  PLAN DATA
merge      2 secs        96,070         MERGE STATEMENT ALL_ROWS Cost: 253 Bytes: 46,752 Cardinality: 974 
c-Insert    killed (31m)          ¿?            INSERT STATEMENT ALL_ROWS Cost: 24,656,135 Bytes: 1,051,700 Cardinality: 105,170 

This is the info on the tables i used:

Table          Initial Rows            Observations 
name_dictionary 3,097           The ones already inserted   
name_source     101,205         The ones i  want to filter and add to the name_dictionary

(couldn't format it right, hope its readable) anyways, i hope you can elaborate if its right or i missed something, Thanks a lot!!!

---new findings if i remove the order by in the merge the cost goes up to 298;

Upvotes: 0

Views: 1275

Answers (1)

Thomas Tschernich
Thomas Tschernich

Reputation: 1282

It is probably easier to use two separate inserts than a merge. First, insert all the entries from table a, like this:

insert into name_new select * from name_dict

Then, do a conditional insert for your second table, like this:

insert into name_new
    select *
    from name_source t1
    where
        (t1.name, t1.gender) not in (
            select name, gender from name_new
        )
        and
        (t1.name, t1.gender) not in (
            select t2.name, t2.gender
            from name_source t2
            join name_source t3 on (t2.name = t3.name and t2.gender != t3.gender)
        )

The first where section sorts out the anna-case, the second one will sort out the duplicates with both genders.

Upvotes: 1

Related Questions