Crash Bandicoot
Crash Bandicoot

Reputation: 61

Create table as select from is creating duplicate records

I am trying to create a table as select from another table which is creating duplicate records. Meaning my SQL statement is returning me approx 11 million rows and my new table is returning me 15 million rows.

Instead of using create table as, I have also tried to create a table table and insert into select ...

I have been able to pinpoint some records that are being duplicated but have no idea why because in my select statement the records was present only once.

Create table Statement :

Create Consolidated_Table as 
select b.* 
from transaction_Table b,
     table_reference c 
where (   ref_1 in (c.field_1, c.field_2, c.field_3) 
       or ref_2 in (c.field_1, c.field_2, c.field_3)
      )

The number of records from my select statement & my new table should tally.

Upvotes: 1

Views: 1567

Answers (3)

I suggest you rewrite your SQL as:

SELECT tt.*
  FROM TRANSACTION_TABLE tt
  WHERE tt.REF_1 IN (SELECT FIELD_1 FROM TABLE_REFERENCE UNION ALL
                     SELECT FIELD_2 FROM TABLE_REFERENCE UNION ALL
                     SELECT FIELD_3 FROM TABLE_REFERENCE) OR
        tt.REF_2 IN (SELECT FIELD_1 FROM TABLE_REFERENCE UNION ALL
                     SELECT FIELD_2 FROM TABLE_REFERENCE UNION ALL
                     SELECT FIELD_3 FROM TABLE_REFERENCE)

The difference between the two queries is that your query is joining TRANSACTION_TABLE and TABLE_REFERENCE, and will create multiple rows when there are multiple hits on the join criteria. The query shown above only gives you one row back from TRANSACTION_TABLE regardless of how many matches are found for it in TABLE_REFERENCE.

Note that this query highlights an issue which may not be apparent in the original query. Both of them will give you back all rows from TRANSACTION_TABLE where the values in the REF_1 or REF_2 fields appear in the FIELD_1, FIELD_2, or FIELD_3 columns of any row in the TABLE_REFERENCE table. If that's what you wanted, that's great - but if it's not it might explain why the results you're getting don't match the results you expected.

Upvotes: 0

Littlefoot
Littlefoot

Reputation: 142778

That's because where clause doesn't restrict number of rows enough.

Here's an example based on Scott's emp and dept tables which certainly don't contain millions of rows, but - what they do contain is enough to show what's probably going on.

DEPT table contains 4 rows, which represents your transaction_table. It means that you'd expect the result to contain 4 rows as well, right?

SQL> select * from dept;

DEPTNO DNAME          LOC

    10 ACCOUNTING     NEW YORK
    20 RESEARCH       DALLAS
    30 SALES          CHICAGO
    40 OPERATIONS     BOSTON

OK; now, your query applied to Scott's tables:

SQL> select d.*
  2  from emp e,
  3       dept d
  4  where e.job in ('CLERK', 'MANAGER')
  5     or e.deptno in (10, 20);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON
        40 OPERATIONS     BOSTON

40 rows selected.

SQL>

It seems that you'll have to think it over. Maybe distinct fixes it; I can't tell, but you can try.

SQL> select distinct d.*
  2  from emp e,
  3       dept d
  4  where e.job in ('CLERK', 'MANAGER')
  5     or e.deptno in (10, 20);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        20 RESEARCH       DALLAS
        40 OPERATIONS     BOSTON
        10 ACCOUNTING     NEW YORK
        30 SALES          CHICAGO

SQL>

Upvotes: 1

THE LIFE-TIME LEARNER
THE LIFE-TIME LEARNER

Reputation: 1522

I think your Query Logic should be Like this...

INSERT INTO NewTable
(
    rollNo,NAME,marks
)
SELECT  t1.rollNo,
        t1.NAME,
        t1.marks
FROM OldTable t1
WHERE NOT EXISTS
(
    SELECT rollNo FROM NewTable t2
    WHERE t2.rollNo=t1.rollNo
)   

Upvotes: 0

Related Questions