joe
joe

Reputation: 173

To find subset of data in Oracle

I have some records in emp1 :

SELECT distinct 
substrb(emp.employee_NAME,1,50) employee_NAME 
FROM employee emp , employee_sites sites , (SELECT DISTINCT employee_id ,
emp_site_number
FROM abc
) abc
where emp.employee_id = sites.employee_id
and abc.employee_id=emp.employee_id
and abc.emp_site_number = sites.emp_site_number ;

and some records in emp:

SELECT  distinct emp.employee_NAME employee_NAME 
FROM employee emp
WHERE 1=1 and EXISTS
(SELECT 1 FROM employee_ACCOUNTS acc WHERE acc.employee_id = emp.employee_id
) 

rowcount of emp : 205001 rowcount of emp1 : 18003

I want to find out if emp has all the records of emp1 ,in other words if emp is superset of emp1. I tried this :

select count(*)  from (SELECT  distinct emp.employee_NAME employee_NAME 
FROM employee emp
WHERE 1=1 and EXISTS
(SELECT 1 FROM employee_ACCOUNTS acc WHERE acc.employee_id = emp.employee_id
) ) emp ,
(SELECT distinct 
substrb(emp.employee_NAME,1,50) employee_NAME 
FROM employee emp , employee_sites sites , (SELECT DISTINCT employee_id ,
emp_site_number
FROM abc
) abc
where emp.employee_id = sites.employee_id
and abc.employee_id=emp.employee_id
and abc.emp_site_number = sites.emp_site_number)  emp1
where emp.employee_NAME = emp1.employee_NAME ;

Rowcount for the above query : 12360. So I have concluded that emp is not a superset of emp1

Someone please let me know what I have done is fine or it needs some modification. Also please share if you know some better way of doing it .

Thanks

Upvotes: 0

Views: 481

Answers (1)

Lord Peter
Lord Peter

Reputation: 3501

You could avoid the correlated subqueries and just do a simple set MINUS operation:

select employee_name -- or whatever makes the employee the same in 2 tables
from emp1 -- the table which may have rows not in the other table

MINUS

select employee_name
from emp2 -- the table which you think may be missing some rows

You could also use a left join:

select emp2.employee_name from emp2
left join emp1 on emp2.employee_name = emp1.employee_name
where emp1.employee_name is null

The performance will depend on factors like indexes, data volumes. Inspection of the query plans and benchmarking will give you a good idea of which is the better option.

Upvotes: 2

Related Questions