Reputation: 388
I can do this programmatically, but was looking for a cleaner solution.
Let's say I have the following table:
First Name Last Name
Smith Albert
Smith Alphonse
Smith Jason
Johnson Charles
Roberts Chris
Roberts Christian
I want to a create a unique with the following rules
For Albert Smith I would return Alb.Smith
For Charles Johnson I would return Johnson
For Christion Roberts I would return Christ.Roberts
Does anyone have any thoughts on how to accomplish this directly in an Oracle SQL statement or should I stick to doing it in a program?
Upvotes: 2
Views: 1205
Reputation: 191455
Version with recursive subquery refactoring (CTE), which requires 11gR2:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number() over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
),
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
),
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select case when last_name_count = 1 then null
when part = first_name then first_name || ' '
else part || '. '
end || last_name as condendsed_name
from v
where rn = 1
order by orig_rn;
Which gives:
CONDENSED_NAME
----------------------------------------------
Johnson
Chris Roberts
Christ. Roberts
Alb. Smith
Alp. Smith
J. Smith
The t
CTE is recursive. It starts with the original table rows and generates additional rows for each possible contraction of the first name:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number () over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
)
select last_name, first_name, part
from t
where last_name = 'Johnson'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART
-------------------- -------------------- ------------------------
Johnson Charles
Johnson Charles C
Johnson Charles Ch
Johnson Charles Cha
Johnson Charles Char
Johnson Charles Charl
Johnson Charles Charle
Johnson Charles Charles
The next CTE, u
(yes, sorry about the names, I was uninspired) compares the values across all the rows and counts the occurrences. Anything with a count of 1
is unique.
...
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
)
select last_name, first_name, part, last_name_count, part_count
from u
where last_name = 'Roberts'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT PART_COUNT
-------------------- -------------------- ------------------------ --------------- ----------
Roberts Chris 2 2
Roberts Chris C 2 2
Roberts Chris Ch 2 2
Roberts Chris Chr 2 2
Roberts Chris Chri 2 2
Roberts Chris Chris 2 2
Roberts Christian 2 2
Roberts Christian C 2 2
Roberts Christian Ch 2 2
Roberts Christian Chr 2 2
Roberts Christian Chri 2 2
Roberts Christian Chris 2 2
Roberts Christian Christ 2 1
Roberts Christian Christi 2 1
Roberts Christian Christia 2 1
Roberts Christian Christian 2 1
The third CTE v
only looks at the unique ones, and then ranks them according to the length of the unique value; so the shortest contraction of the first name for a record that is unique across all records is ranked as 1
.
...
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select last_name, first_name, part, last_name_count
from v
where rn = 1
order by orig_rn;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT
-------------------- -------------------- ------------------------ ---------------
Johnson Charles 1
Roberts Chris Chris 2
Roberts Christian Christ 2
Smith Albert Alb 3
Smith Alphonse Alp 3
Smith Jason J 3
Then the final query just extracts those ranked 1
, which are the shortest unique values, and formats them the way you wanted.
If two people have exactly the same name then both are spelled out in full (demo) which seems to be what you want from your comment.
Not sure if this really qualifies as 'cleaner', except that it only hits the original table once.
Upvotes: 6
Reputation: 41
Try this:
with
last_names as (
select last_name, count(*) as last_name_count
from table_name
group by last_name )
select case
when b.last_name_count = 1 then a.last_name
else substr(a.first_name,1,1)||'. '||a.last_name
end as name
from table_name a
join last_names b
on a.last_name = b.last_name;
replacing table_name
with the right name.
Upvotes: 3