anon
anon

Reputation:

Simplest way to split human name in postgres?

Consider a table with full human names:

create table names (full_name varchar not null);   
insert into names (full_name) 
  values ('Jane Marie Doe'), ('John Doe');

In postgres, what is the simplest (most readable) way to split the names into first and last?

 first_name | last_name 
------------+-----------
 Jane Marie | Doe
 John       | Doe
(2 rows)

This is what I came up with, but it seems really complicated. Surely there's a simpler way?

select 
  array_to_string(ary[1:len - 1], ' ') as first_name, 
  ary[len] as last_name 
from (
  select ary, array_length(ary, 1) as len
  from (
    select regexp_split_to_array(full_name, E'\\s+') as ary 
    from names
  ) sub1
) sub2;

Upvotes: 4

Views: 9846

Answers (5)

moonshot
moonshot

Reputation: 699

The top answer has the correct regex for last name. For first name it is not accurate for the sample the OP provided like "Jane Marie Doe". You need a regex with a capturing group:

SELECT regexp_replace(full_name,'(.+)\s\S+$','\1') as first_name 
SELECT regexp_replace(full_name,'.+[\s]','') as last_name

Upvotes: 0

Reculos Gerbi Neto
Reculos Gerbi Neto

Reputation: 449

A simple way to get first name and the rest of the name separately:

select regexp_replace('John Martin Doe','\\s.*','') as first_name, regexp_replace('John Martin Doe','[a-zA-Z]+','') as last_name;

Upvotes: 0

F. Stephen Q
F. Stephen Q

Reputation: 4306

You could try

SELECT regexp_split_to_array(full_name, E'\\s\\S+') as ary

instead, as that should split on the last space consistently. Then you know there will only be two members of ary

EDIT: On further reflection, I think the best approach might be to use

SELECT regexp_replace(full_name,'\s\S+','') as first_name
SELECT regexp_replace(full_name,'.+[\s]','') as last_name

Upvotes: 7

Craig Ringer
Craig Ringer

Reputation: 324485

For the record, the current design is correct, and you shouldn't change it.

If you attempt to do so the only vaguely right way to do it is ask each user what their name parts are. If you wish you can attempt to guess this for simple-looking two part names, but even then you don't know if "kim seolhyun" is "Kim, Seolhyun" or "Seolhyun, Kim".

A company I worked with did a name split recently. They did it in the least-awful way I've yet seen:

  • Sent email to all their users telling them about it in advance
  • Changed the "name" field to "preferred display name"
  • Added new "Given Name" and "Family Name" fields, avoiding the confusing trap of calling them "first" and "last" names
  • Added a new field "Alternate name(s)" for people who have anglicised and native-language names, transliterated and original charset names, etc
  • Attempted to guess for simple-looking names, left the rest unchanged
  • Sent email to their users to offer them a page they could use to check the name split (if guessed) or enter their name parts (if not guessed).
  • Made it clear that if you only have one name you can enter it in the given name field and leave family name blank. Did not enforce family name.
  • Did not make any attempt to validate names. Your name is ~~M^2? Good for you, you're weird.
  • Did not enforce capitalization, spaces, etc. Names like de Vinh are fine
  • Did not screw up their text encoding handling.

Basically, any attempt to use regex to split names is wrong. If you really insist on doing it, do it only for names that "look" simple, e.g.

SELECT regexp_matches('Fred Smith', '^([A-Z][[:alpha:]]+) ([A-Z][[:alpha:]]+)$');

which will only match names that have two obvious parts, both beginning with a capital letter, and contain only letters (accented or not). Even then, it doesn't solve the "given family" or "family given" problem. Other names won't match, and should be left as-is for the user to change.

Upvotes: 4

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

One more way to do it with string functions. Find the first space from the end and select it as last name and then replace the last name string in the original string with ''.

with lname as 
(select name, 
reverse(substr(reverse(name),1,strpos(reverse(name),' '))) as last_name
from names)
select replace(name,last_name,'') as first_name, last_name
from lname

Upvotes: 1

Related Questions