Reputation: 1582
Imagine I am a company that sells horoscopes, based on customer names. I have a table with surname, family name, and horoscope text. As I can't cover every single name combination, I often store the surname as NULL, as a catch-all value.
Horoscope DB
sur | fam | horoscope
----------------------
John| Doe | text1
Jane| Doe | text2
NULL| Doe | text3
Ike | Smith| text4
NULL| Smith| text5
And a list of customers
customer DB
sur | fam
---------
John| Doe
Jack| Doe
Lisa| Smith
Carl| Smith
Now we need to match a horoscope to each customer. If we have a full match on surname and family name, we match on both, but if we don't have a full match, we match simply on surname, so the result will be:
Customer horoscope DB
sur | fam | horoscope
----------------------
John| Doe | text1
Jack| Doe | text3
Lisa| Smith| text5
Carl| Smith| text5
If I do a normal LEFT JOIN USING(sur, fam)
, I will only get a match on John. If I use LEFT JOIN USING(fam)
I will get a lot of duplications. I need to setup some conditions, but I am not sure how.
I am willing to change my catch-all value if needed be, or encode it as a separate column.
Specifically I am working with Google Big Query. I have set up a DB-fiddle, that you are welcome to use
Upvotes: 2
Views: 249
Reputation: 222432
Another solution would be to use conditional aggregation. You can JOIN on the family name, and then check if a horoscope exists for the given surname ; if not, then fallback on the NULL surname.
SELECT
c.sur,
c.fam,
COALESCE(
MAX(CASE WHEN c.sur = h.sur THEN h.text END),
MAX(CASE WHEN h.sur IS NULL THEN h.text END)
) horoscope_text
FROM
customer c
INNER JOIN horoscope h ON c.fam = h.fam
GROUP BY
c.sur,
c.fam
Upvotes: 1
Reputation: 172994
Below is for BigQuery Standard SQL
#standardSQL
SELECT c.sur, c.fam,
ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam
You can test, play with above using sample data from your example as in example below
#standardSQL
WITH `project.dataset.horoscope` AS (
SELECT 'John' sur,'Doe' fam, 'text1' horoscope UNION ALL
SELECT 'Jane', 'Doe', 'text2' UNION ALL
SELECT NULL, 'Doe', 'text3' UNION ALL
SELECT 'Ike', 'Smith', 'text4' UNION ALL
SELECT NULL, 'Smith', 'text5'
), `project.dataset.customer` AS (
SELECT 'John' sur, 'Doe' fam UNION ALL
SELECT 'Jack', 'Doe' UNION ALL
SELECT 'Lisa', 'Smith' UNION ALL
SELECT 'Carl', 'Smith'
)
SELECT c.sur, c.fam,
ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam
with result
Row sur fam horoscope
1 John Doe text1
2 Jack Doe text3
3 Lisa Smith text5
4 Carl Smith text5
Upvotes: 1
Reputation: 1683
Based on what I understood, here is one way to do it
select c.id customer_id, c.sur, c.fam, h.id horoscope_id, h.sur h_sur,
h.fam h_fam, h.horoscope
FROM customer c join horoscope h
on (c.sur = h.sur and c.fam = h.fam)
or (h.sur is null and c.fam = h.fam and not exists
(select 1 from horoscope h1 where h1.sur = c.sur and h1.fam = c.fam)
)
and the result
Upvotes: 1
Reputation: 164089
You can join on multiple conditions to cover each case:
select c.sur, c.fam, h.horoscope from customer c
inner join horoscope h
on (c.fam = h.fam and c.sur = h.sur) or
(c.fam = h.fam and h.sur is null and not exists(
select 1 from horoscope
where fam = c.fam and sur = c.sur
)
)
See the demo
Upvotes: 1
Reputation: 1269623
Here is one method:
select . . .
from (select c.*,
h.* except (sur, fam), -- whatever columns you want
row_number() over (partition by c.fam
order by (case when c.sur = h.sur then 1 else 2 end)
) as seqnum
from horoscope h join
customer c
on c.fam = h.fam
) ch
where seqnum = 1;
Basically, this joins on the family and chooses the "best match" -- which is an exact match on the surname.
You should be careful, though, because different families can have the same family name.
Upvotes: 1