axeMaltesse
axeMaltesse

Reputation: 67

PostgreSQL regex - split column to array

I have a table music:

author                |  music
----------------------+-------
Kevin Clein           |   a
Gucio G. Gustawo      |   b
R. R. Andrzej         |   c
John McKnight Burman  |   d

How can I split a column which contain two different symbols (space and dot) and how to split name and surmane correctly to have result like:

author                |  name   | surname
----------------------+---------+----------------
Kevin Clein           |   Kevin | Clein           
Gucio G. Gustawo      | Gucio G.| Gustawo
R. R. Andrzej         |   R. R. | Andrzej
John McKnight Burman  |   John  | McKnight Burman

I have tried something like that so far:

WITH ad AS(
SELECT author,
  s[1] AS name,
  s[2] AS surname
  FROM (SELECT music.*,
  regexp_split_to_array(music.author,E'\\s[.]') AS s
       FROM music)t
)SELECT * FROM ad;

Upvotes: 0

Views: 1717

Answers (1)

Jorge Campos
Jorge Campos

Reputation: 23361

I've create a possible solution to you. Be aware that it may not solve all problems and you will need to create an extra table to solve rules problem. By rule I mean what I've said in the comments like:

When to decide which is name and surname.

So in order to solve your problem I had to create another table that will handle surnames that should be considered as so.

The test case scenario:

create table surname (
  id SERIAL NOT NULL primary key,
  sample varchar(100)
);

--Test case inserts
insert into surname (sample) values ('McKnight'), ('McGregory'), ('Willian'), ('Knight');

create table music (
  id SERIAL NOT NULL primary key,
  author varchar(100)
);

insert into music (author) values
('Kevin Clein'),
('Gucio G. Gustawo'),
('R. R. Andrzej'),
('John McKnight Burman'),
('John Willian Smith'),
('John Williame Smith');

And My proposed solution:

select author,
       trim(replace(author, surname, '')) as name,
       surname
  from (
    select author,
          case when position(s.sample in m.author)>0 
          then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
          else trim(substring( author from '\s\w+$'  ))
           end as surname
      from music m left join surname s 
        on m.author like '%'||s.sample||'%'
     where case when position(s.sample in m.author)>0 
          then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
          else trim(substring( author from '\s\w+$'  )) end is not null
       ) as x

The output will be:

   AUTHOR              NAME             SURNAME
------------------------------------------------------------   
Kevin Clein            Kevin            Clein
Gucio G. Gustawo       Gucio G.         Gustawo
R. R. Andrzej          R. R.            Andrzej
John McKnight Burman   John             McKnight Burman
John Willian Smith     John             Willian Smith
John Williame Smith    John Williame    Smith

See it working here: http://sqlfiddle.com/#!15/c583f/2

In the table surname you will insert all names that should be considered as surname.

You may want to sub-query the query that do the case expression so you would use just the field instead of the hole case statement again on the where clause.

Upvotes: 1

Related Questions