monte
monte

Reputation: 1885

Postgresql Regex pattern matching before a word or end of line

I have the following rows (sample) in my postgresql from which I am trying to extracts words.

subset:|actuals_source:|comments:February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice_Commit_a57f193aede
subset:|actuals_source:|comments:china_market_CSOP_Demand Sales Volume_PFJan2021
subset:|actuals_source:zeros|comments:Functional_test_2_Cores_FWD_facing_regressor_leak_fix_commit_c596a521e07

The words I am trying to extract starts from comments: and are before Commit_c596a521e07 if the Commit word exists or it's till the end of line. (Also note that Commit can also be of lowercase as in commit)

So the extracted column should look like:

February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice
china_market_CSOP_Demand Sales Volume_PFJan2021
Functional_test_2_Cores_FWD_facing_regressor_leak_fix

I have tried the following pattern, but unfortunately it returns everything after Commit

substring(col_name from '.*comments:(.*?)(?:Commit|$)')

Is there a way to limit the selection, such that it returns words only before Commit (wherever Commit exists) or till the end of line.

Upvotes: 0

Views: 1008

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627517

You can match c in Commit in a case insensitive way using [cC] and make sure you match a _ before Commit and use

substring(col_name from 'comments:(.*?)(_[Cc]ommit|$)')

Note that the substring function returns the Group 1 value here:

If the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned

The pattern matches

  • comments: - a literal string
  • (.*?) - Group 1: any zero or more chars as few as possible
  • (_[Cc]ommit|$) - Group 2 (this value won't be returned): _Commit or _commit or end of string.

See a PostgreSQL demo:

WITH demo(col_name) AS (
   VALUES
      ('subset:|actuals_source:|comments:February_2021_FFRPrice_RegModelsOnly_Demand Sales Volume*CSOPprice_Commit_a57f193aede'),
      ('subset:|actuals_source:|comments:china_market_CSOP_Demand Sales Volume_PFJan2021'),
      ('subset:|actuals_source:zeros|comments:Functional_test_2_Cores_FWD_facing_regressor_leak_fix_commit_c596a521e07')
   )
SELECT substring(col_name from 'comments:(.*?)(_[Cc]ommit|$)') FROM demo;

Output:

enter image description here

Upvotes: 2

Related Questions