jian
jian

Reputation: 4824

extract first two word from an string exclude comma using regex in Postgresql

select regexp_matches('Hi, I am Harry Potter', '^[a-zA-Z0-9]*\W+\w+');
select regexp_matches('Hi, I am Harry Potter', '\w+\W+\w+');

Both way returns {Hi, I} But expect {Hi I}. Related Question: extract first word in an String: extract the first word from a string - regex

Upvotes: 1

Views: 676

Answers (2)

Laurenz Albe
Laurenz Albe

Reputation: 246513

You can use this pattern:

select regexp_match(
          'Hi, I am Harry Potter',
          '^([[:alnum:]]+)[^[:alnum:]]+([[:alnum:]]+)'
       );

 regexp_matches 
════════════════
 {Hi,I}
(1 row)

The pattern matches the first sequence of alphanumerical characters, then a sequence of non-alphanumerical characters and another sequence of alphanumerical characters. The result is an array with the first and third expression, which are parenthesized in the pattern.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626835

You cannot match disjoint (non-adjoining) parts of a string into a single group.

You can use REGEXP_REPLACE to capture the first two words into separate groups and then use two backreferences to the group in the replacement pattern to get what you need:

select regexp_replace('Hi, I am Harry Potter', '^\W*(\w+)\W+(\w+).*', '\1 \2');

See the online demo. The regex means

  • ^ - start of string
  • \W* - zero or more non-word chars
  • (\w+) - Group 1 (\1): one or more word chars
  • \W+ - one or more non-word chars
  • (\w+) - Group 2 (\2): one or more word chars
  • .* - the rest of the string.

Upvotes: 2

Related Questions