Reputation: 61
I am trying to create a table as select from another table which is creating duplicate records. Meaning my SQL statement is returning me approx 11 million rows and my new table is returning me 15 million rows.
Instead of using create table as, I have also tried to create a table table and insert into select ...
I have been able to pinpoint some records that are being duplicated but have no idea why because in my select statement the records was present only once.
Create table Statement :
Create Consolidated_Table as
select b.*
from transaction_Table b,
table_reference c
where ( ref_1 in (c.field_1, c.field_2, c.field_3)
or ref_2 in (c.field_1, c.field_2, c.field_3)
)
The number of records from my select statement & my new table should tally.
Upvotes: 1
Views: 1567
Reputation: 50017
I suggest you rewrite your SQL as:
SELECT tt.*
FROM TRANSACTION_TABLE tt
WHERE tt.REF_1 IN (SELECT FIELD_1 FROM TABLE_REFERENCE UNION ALL
SELECT FIELD_2 FROM TABLE_REFERENCE UNION ALL
SELECT FIELD_3 FROM TABLE_REFERENCE) OR
tt.REF_2 IN (SELECT FIELD_1 FROM TABLE_REFERENCE UNION ALL
SELECT FIELD_2 FROM TABLE_REFERENCE UNION ALL
SELECT FIELD_3 FROM TABLE_REFERENCE)
The difference between the two queries is that your query is joining TRANSACTION_TABLE
and TABLE_REFERENCE
, and will create multiple rows when there are multiple hits on the join criteria. The query shown above only gives you one row back from TRANSACTION_TABLE regardless of how many matches are found for it in TABLE_REFERENCE.
Note that this query highlights an issue which may not be apparent in the original query. Both of them will give you back all rows from TRANSACTION_TABLE
where the values in the REF_1
or REF_2
fields appear in the FIELD_1
, FIELD_2
, or FIELD_3
columns of any row in the TABLE_REFERENCE table. If that's what you wanted, that's great - but if it's not it might explain why the results you're getting don't match the results you expected.
Upvotes: 0
Reputation: 142778
That's because where
clause doesn't restrict number of rows enough.
Here's an example based on Scott's emp
and dept
tables which certainly don't contain millions of rows, but - what they do contain is enough to show what's probably going on.
DEPT
table contains 4 rows, which represents your transaction_table
. It means that you'd expect the result to contain 4 rows as well, right?
SQL> select * from dept;
DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
OK; now, your query applied to Scott's tables:
SQL> select d.*
2 from emp e,
3 dept d
4 where e.job in ('CLERK', 'MANAGER')
5 or e.deptno in (10, 20);
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
20 RESEARCH DALLAS
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
30 SALES CHICAGO
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 OPERATIONS BOSTON
40 rows selected.
SQL>
It seems that you'll have to think it over. Maybe distinct
fixes it; I can't tell, but you can try.
SQL> select distinct d.*
2 from emp e,
3 dept d
4 where e.job in ('CLERK', 'MANAGER')
5 or e.deptno in (10, 20);
DEPTNO DNAME LOC
---------- -------------- -------------
20 RESEARCH DALLAS
40 OPERATIONS BOSTON
10 ACCOUNTING NEW YORK
30 SALES CHICAGO
SQL>
Upvotes: 1
Reputation: 1522
I think your Query Logic should be Like this...
INSERT INTO NewTable
(
rollNo,NAME,marks
)
SELECT t1.rollNo,
t1.NAME,
t1.marks
FROM OldTable t1
WHERE NOT EXISTS
(
SELECT rollNo FROM NewTable t2
WHERE t2.rollNo=t1.rollNo
)
Upvotes: 0