Remove duplicates by sorting and filtering

Question

I have data in my table which looks like the below :

INPUT :

version value  code   type
PMS    0.00    01    HOURS
000    312.00  01    HOURS
000    0.00    04    HOURS
PMS    0.00    01    NON STOCK
000    835.00  01    NON STOCK
000    835.00  04    NON STOCK

step 1 : sort it by code, which is 01 and 04.
step 2 : sort it by type, which is HOURS and NON STOCK.
step 3 : Retrieve by version, if 0.00 then retrieve the record.

Explanation for the output :

For code = 01, we have 4 records, 2 records for the type = HOURS and 2 records for type = NON STOCK. but choose the record with Version not equal to PMS(due to duplicates in column type). hence we get two outputs with code 01.

   000    312.00   01    HOURS
   000    835.00   01    NON STOCK

For code = 04, we have 2 records, 1 record for the type = HOURS and 1 record for type = NON STOCK. since we don't have duplicates for column type here, need not compare version in this case. hence we get two outputs with code 01

    000    0.00     04    HOURS
    000    835.00   04    NON STOCK

(The database Engine is Azure databricks)

EXPECTED OUTPUT :

version   value   code   type
   000    312.00   01    HOURS
   000    835.00   01    NON STOCK
   000    0.00     04    HOURS
   000    835.00   04    NON STOCK

mani bharataraju · Accepted Answer

Assuming that if there is no duplicate rows you need to pull the record even if the version is PMS. Try this:

select
* 
from(
select
a.*,
case when b.code is not null and version <>'PMS' then 1 
when b.code is null then 1
else 0 end as filter_val
from
input a
left outer join
(Select
code,
type
from input
group by 
code,
type having count(*) > 1) b
on a.code=b.code
and a.type=b.type) x
where filter_val=1;

Remove duplicates by sorting and filtering

Answers (2)

Related Questions