Olsgaard
Olsgaard

Reputation: 1582

SQL Join on wildcard column / join on col1 and col2 if col1 in table else join on col2

Imagine I am a company that sells horoscopes, based on customer names. I have a table with surname, family name, and horoscope text. As I can't cover every single name combination, I often store the surname as NULL, as a catch-all value.

Horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jane| Doe  | text2
NULL| Doe  | text3
Ike | Smith| text4
NULL| Smith| text5

And a list of customers

customer DB

sur | fam
---------
John| Doe
Jack| Doe
Lisa| Smith
Carl| Smith

Now we need to match a horoscope to each customer. If we have a full match on surname and family name, we match on both, but if we don't have a full match, we match simply on surname, so the result will be:

Customer horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jack| Doe  | text3
Lisa| Smith| text5
Carl| Smith| text5

If I do a normal LEFT JOIN USING(sur, fam), I will only get a match on John. If I use LEFT JOIN USING(fam) I will get a lot of duplications. I need to setup some conditions, but I am not sure how.

I am willing to change my catch-all value if needed be, or encode it as a separate column.

Specifically I am working with Google Big Query. I have set up a DB-fiddle, that you are welcome to use

Upvotes: 2

Views: 249

Answers (5)

GMB
GMB

Reputation: 222432

Another solution would be to use conditional aggregation. You can JOIN on the family name, and then check if a horoscope exists for the given surname ; if not, then fallback on the NULL surname.

SELECT
    c.sur,
    c.fam,
    COALESCE(
        MAX(CASE WHEN c.sur = h.sur THEN h.text END),
        MAX(CASE WHEN h.sur IS NULL THEN h.text END) 
    ) horoscope_text
FROM
    customer c
    INNER JOIN horoscope h ON c.fam = h.fam
GROUP BY 
    c.sur,
    c.fam

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Below is for BigQuery Standard SQL

#standardSQL
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam

You can test, play with above using sample data from your example as in example below

#standardSQL
WITH `project.dataset.horoscope` AS (
  SELECT 'John' sur,'Doe' fam, 'text1' horoscope UNION ALL
  SELECT 'Jane', 'Doe', 'text2' UNION ALL
  SELECT NULL, 'Doe', 'text3' UNION ALL
  SELECT 'Ike', 'Smith', 'text4' UNION ALL
  SELECT NULL, 'Smith', 'text5' 
), `project.dataset.customer` AS (
  SELECT 'John' sur, 'Doe' fam UNION ALL
  SELECT 'Jack', 'Doe' UNION ALL
  SELECT 'Lisa', 'Smith' UNION ALL
  SELECT 'Carl', 'Smith' 
)
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam  

with result

Row sur     fam     horoscope    
1   John    Doe     text1    
2   Jack    Doe     text3    
3   Lisa    Smith   text5    
4   Carl    Smith   text5    

Upvotes: 1

Gro
Gro

Reputation: 1683

Based on what I understood, here is one way to do it

select c.id customer_id, c.sur, c.fam, h.id horoscope_id, h.sur h_sur, 
h.fam h_fam, h.horoscope
FROM customer c join horoscope h
on (c.sur = h.sur and c.fam = h.fam)
or (h.sur is null and c.fam = h.fam and not exists 
      (select 1 from horoscope h1 where h1.sur = c.sur and h1.fam = c.fam)
   )


and the result

enter image description here

Upvotes: 1

forpas
forpas

Reputation: 164089

You can join on multiple conditions to cover each case:

select c.sur, c.fam, h.horoscope from customer c 
inner join horoscope h
on (c.fam = h.fam and c.sur = h.sur) or 
  (c.fam = h.fam and h.sur is null and not exists(
    select 1 from horoscope 
    where fam = c.fam and sur = c.sur
  )
)

See the demo

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269623

Here is one method:

select . . .
from (select c.*,
             h.* except (sur, fam), -- whatever columns you want
             row_number() over (partition by c.fam
                                order by (case when c.sur = h.sur then 1 else 2 end)
                               ) as seqnum
      from horoscope h join
           customer c
           on c.fam = h.fam
     ) ch
where seqnum = 1;

Basically, this joins on the family and chooses the "best match" -- which is an exact match on the surname.

You should be careful, though, because different families can have the same family name.

Upvotes: 1

Related Questions