Create unique string of first and last name in Oracle

Question

I can do this programmatically, but was looking for a cleaner solution.

Let's say I have the following table:

First Name      Last Name
Smith           Albert       
Smith           Alphonse    
Smith           Jason         
Johnson         Charles
Roberts         Chris
Roberts         Christian

I want to a create a unique with the following rules

If the last name is already unique just return just last name
If same last name return first initial (or more) followed by a period then last name

For Albert Smith I would return Alb.Smith
For Charles Johnson I would return Johnson
For Christion Roberts I would return Christ.Roberts

Does anyone have any thoughts on how to accomplish this directly in an Oracle SQL statement or should I stick to doing it in a program?

Alex Poole · Accepted Answer

Version with recursive subquery refactoring (CTE), which requires 11gR2:

with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
  select last_name, first_name,
    row_number() over (order by last_name, first_name),
    cast (null as varchar2(20)), 0, length(first_name)
  from t42
  union all
  select last_name, first_name, orig_rn,
    part || substr(first_name, part_length + 1, 1),
    part_length + 1,
    remaining - 1
  from t
  where remaining > 0
),
u as (
  select last_name, first_name, orig_rn, part, part_length,
    count(distinct orig_rn) over (partition by last_name) as last_name_count,
    count(distinct orig_rn) over (partition by last_name, part) as part_count
  from t
),
v as (
  select last_name, first_name, orig_rn, part, last_name_count,
  row_number() over (partition by orig_rn order by part_length) as rn
  from u
  where (part_count = 1 or part = first_name)
)
select case when last_name_count = 1 then null
  when part = first_name then first_name || ' '
  else part || '. '
  end || last_name as condendsed_name
from v
where rn = 1
order by orig_rn;

Which gives:

CONDENSED_NAME                               
----------------------------------------------
Johnson                                        
Chris Roberts                                  
Christ. Roberts                                
Alb. Smith                                     
Alp. Smith                                     
J. Smith

SQL Fiddle.

The t CTE is recursive. It starts with the original table rows and generates additional rows for each possible contraction of the first name:

with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
  select last_name, first_name,
    row_number () over (order by last_name, first_name),
    cast (null as varchar2(20)), 0, length(first_name)
  from t42
  union all
  select last_name, first_name, orig_rn,
    part || substr(first_name, part_length + 1, 1),
    part_length + 1,
    remaining - 1
  from t
  where remaining > 0
)
select last_name, first_name, part
from t
where last_name = 'Johnson'
order by orig_rn, part_length;

LAST_NAME            FIRST_NAME           PART                   
-------------------- -------------------- ------------------------
Johnson              Charles                                       
Johnson              Charles              C                        
Johnson              Charles              Ch                       
Johnson              Charles              Cha                      
Johnson              Charles              Char                     
Johnson              Charles              Charl                    
Johnson              Charles              Charle                   
Johnson              Charles              Charles

The next CTE, u (yes, sorry about the names, I was uninspired) compares the values across all the rows and counts the occurrences. Anything with a count of 1 is unique.

...
u as (
  select last_name, first_name, orig_rn, part, part_length,
    count(distinct orig_rn) over (partition by last_name) as last_name_count,
    count(distinct orig_rn) over (partition by last_name, part) as part_count
  from t
)
select last_name, first_name, part, last_name_count, part_count
from u
where last_name = 'Roberts'
order by orig_rn, part_length;

LAST_NAME            FIRST_NAME           PART                     LAST_NAME_COUNT PART_COUNT
-------------------- -------------------- ------------------------ --------------- ----------
Roberts              Chris                                                       2          2 
Roberts              Chris                C                                      2          2 
Roberts              Chris                Ch                                     2          2 
Roberts              Chris                Chr                                    2          2 
Roberts              Chris                Chri                                   2          2 
Roberts              Chris                Chris                                  2          2 
Roberts              Christian                                                   2          2 
Roberts              Christian            C                                      2          2 
Roberts              Christian            Ch                                     2          2 
Roberts              Christian            Chr                                    2          2 
Roberts              Christian            Chri                                   2          2 
Roberts              Christian            Chris                                  2          2 
Roberts              Christian            Christ                                 2          1 
Roberts              Christian            Christi                                2          1 
Roberts              Christian            Christia                               2          1 
Roberts              Christian            Christian                              2          1

The third CTE v only looks at the unique ones, and then ranks them according to the length of the unique value; so the shortest contraction of the first name for a record that is unique across all records is ranked as 1.

...
v as (
  select last_name, first_name, orig_rn, part, last_name_count,
  row_number() over (partition by orig_rn order by part_length) as rn
  from u
  where (part_count = 1 or part = first_name)
)
select last_name, first_name, part, last_name_count
from v
where rn = 1
order by orig_rn;

LAST_NAME            FIRST_NAME           PART                     LAST_NAME_COUNT
-------------------- -------------------- ------------------------ ---------------
Johnson              Charles                                                     1 
Roberts              Chris                Chris                                  2 
Roberts              Christian            Christ                                 2 
Smith                Albert               Alb                                    3 
Smith                Alphonse             Alp                                    3 
Smith                Jason                J                                      3

Then the final query just extracts those ranked 1, which are the shortest unique values, and formats them the way you wanted.

If two people have exactly the same name then both are spelled out in full (demo) which seems to be what you want from your comment.

Not sure if this really qualifies as 'cleaner', except that it only hits the original table once.

Create unique string of first and last name in Oracle

Answers (2)

Related Questions