Mike
Mike

Reputation: 139

SQL INNER JOIN duplicates to be remove based on criteria

I have the below two tables:

Table 1

┌──────────┬────────────┬───────────────┐
│ account1 │ Fruit_name │ First_harvest │
├──────────┼────────────┼───────────────┤
│      567 │ Apple      │        201805 │
│      432 │ Mango      │        201809 │
│      567 │ Apple      │        201836 │
└──────────┴────────────┴───────────────┘

Table 2

┌──────────┬─────────────┬──────────────┬───────────────┬──────────────┬─────────────┐
│ account1 │ Fruit_name  │ Current_Farm │ Previous_Farm │ FirstHarvest │ LastHarvest │
├──────────┼─────────────┼──────────────┼───────────────┼──────────────┼─────────────┤
│      567 │ Apple       │ APFarm       │ AppleYard     │       201801 │      201810 │
│      567 │ Apple       │ APFarm       │ FruitFarm     │       201805 │      201830 │
│      567 │ Apple       │ APFarm       │ FruitMarket   │       201831 │      999999 │
│      567 │ Royal Gala  │ APFarm       │ GrocerWorld   │       201815 │      999999 │
└──────────┴─────────────┴──────────────┴───────────────┴──────────────┴─────────────┘

My code:

SELECT DISTINCT a.account1,a.fruit_name,Max(a.first_harvest) first_harvest,b.current_farm,b.previous_farm,b.firstharvest,b.lastharvest
FROM   fruit_harvest_data a 
  INNER JOIN fruit_farm_data b 
    ON a.account1 = b.account1 
    AND CASE WHEN b.fruit_name = 'Apple'    THEN 'Royal Gala'
            ELSE b.fruit_name END =
                CASE WHEN a.fruit_name = 'Apple' THEN 'Royal gala'
                ELSE a.fruit_name END
    WHERE  a.first_harvest BETWEEN b.firstharvest AND b.lastharvest 
GROUP  BY a.account1,a.fruit_name,b.current_farm,b.previous_farm,b.firstharvest,b.lastharvest 
HAVING Max(a.first_harvest) >= 201801 

Result:

┌──────────┬────────────┬───────────────┬──────────────┬───────────────┬──────────────┬─────────────┐
│ account1 │ Fruit_name │ First_harvest │ Current_Farm │ Previous_Farm │ FirstHarvest │ LastHarvest │
├──────────┼────────────┼───────────────┼──────────────┼───────────────┼──────────────┼─────────────┤
│      567 │ Apple      │        201836 │ APFarm       │ FruitMarket   │       201831 │      999999 │
│      567 │ Royal Gala │        201836 │ APFarm       │ GrocerWorld   │       201815 │      999999 │
└──────────┴────────────┴───────────────┴──────────────┴───────────────┴──────────────┴─────────────┘

Request:

I get duplicate data due to the way we have this stored. Is there a way to only show the result if account1 has both Apple and Royal Gala then it should only select Royal Gala.

Please note: account1 eg., 567 can have multiple fruits like apple, roya gal, mango, orange. but should only select Royal gala in case if exists in both apple and royal gala.

Upvotes: 0

Views: 60

Answers (2)

TomC
TomC

Reputation: 2814

Still unclear about what you want in your result set - a more complete desired result would help, but to answer the question as to how to do it:

Since you have mentioned that Apple/Gala is an example, I would create a new table to contain these pairs:

create table replace_list(oldfruit varchar(20), newfruit varchar(20))
insert replace_list values ('Apple','Royal Gala')

Then in your query add this:

left join replace_list r on r.oldfruit=b.fruit_name
left join fruit_farm_data n on n.account1=a.account1 and n.fruit_name=newfruit

and in your where clause, you will check where either the fruit name does not have a replacement r.oldfruit is null or it does have a replacement, but the farm doesnt have that fruit n.fruit_name is null

where r.oldfruit is null or n.fruit_name is null

The rest of the query you can work out for yourself.

Upvotes: 0

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 31991

I think below should work

select distinct T.* from 
        (SELECT DISTINCT a.account1,
     case when a.fruit_name='Apple' or a.fruit_name='Royal Gala' then
    'Apple' else a.fruit_name end as fruit_name  ,Max(a.first_harvest) first_harvest,b.current_farm,b.previous_farm,b.firstharvest,b.lastharvest
        FROM   fruit_harvest_data a 
          INNER JOIN fruit_farm_data b 
            ON a.account1 = b.account1 
            AND CASE WHEN b.fruit_name = 'Apple'    THEN 'Royal Gala'
                    ELSE b.fruit_name END =
                        CASE WHEN a.fruit_name = 'Apple' THEN 'Royal gala'
                        ELSE a.fruit_name END
            WHERE  a.first_harvest BETWEEN b.firstharvest AND b.lastharvest 
        GROUP  BY a.account1,a.fruit_name,b.current_farm,b.previous_farm,b.firstharvest,b.lastharvest 
        HAVING Max(a.first_harvest) >= 201801 
    ) as  T

Upvotes: 1

Related Questions