J Sparling
J Sparling

Reputation: 7

Regex to split values in PostgreSQL

I have a list of values coming from a PGSQL database that looks something like this:

198
199
1S
2
20
997
998
999
C1
C10
A

I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.

The above list would then be split into this layout as the result from PG:

Table Format

I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.

All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.

Upvotes: 0

Views: 5587

Answers (2)

trincot
trincot

Reputation: 350300

For extracting the characters before the digits:

regexp_replace(fieldname, '\d.*$', '')

For extracting the characters after the digits:

regexp_replace(fieldname, '^([^\d]*\d*)', '')

Note that:

  • if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
  • the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
  • This also works for any non-alphanumeric characters like @, [, ! ...etc.

Final SQL

select
  fieldname as original,
  regexp_replace(fieldname, '\d.*$', '') as before_s,
  regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
  cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;  

See fiddle.

Upvotes: 1

Kamil Gosciminski
Kamil Gosciminski

Reputation: 17157

This answer relies on information you delivered, which is

Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.

  1. Everything non-numeric before a numeric value into 1 column
  2. Everything non-numeric after a numeric value into 2 column

So there's assumption that you have a value that has a numeric value in it.

select 
  val,
  regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
  regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from 
  val;

Attached SQLFiddle for a preview.

Upvotes: 0

Related Questions