pats4u
pats4u

Reputation: 271

Regex lookahead & lookbehind

I am trying regex to extract data backwards using lookahead & lookbehind.

In the below, I am interested in only column store error, i.e., with pattern as : search table error:, I need to extract string until previous :.

Error processing. Reason: Exception: Job aborted due to failure: xxxxx (asasdasd): com.db.jdbc.exceptions.JDBCDriverException: DBTech JDBC: [2048]: column store error: search table error:  [123]

I am currently stuck with (?<=:)(.*?)(?=(: search table error)). This is extracting from the first occurrence of : from beginning.

Thank you for any help.

Upvotes: 2

Views: 334

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

:\s*([^:]*?)\s*: search table error

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^:]*?                   any character except: ':' (0 or more
                             times (matching the least amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  : search table error     ': search table error'

Python code:

import re
regex = r":\s*([^:]*?)\s*: search table error"
test_str = "Error processing. Reason: Exception: Job aborted due to failure: xxxxx (asasdasd): com.db.jdbc.exceptions.JDBCDriverException: DBTech JDBC: [2048]: column store error: search table error:  [123]"
match = re.search(regex, test_str)
if match:
    print(match.group(1))

Results: column store error


Also:

(?<=:)[^:]*(?=:\s+search\s+table\s+error:)

See this regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    :                        ':'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  [^:]*                    any character except: ':' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    :                        ':'
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    search                   'search'
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    table                    'table'
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    error:                   'error:'
--------------------------------------------------------------------------------
  )                        end of look-ahead

with Python code like

import re
regex = r"(?<=:)[^:]*(?=:\s+search\s+table\s+error:)"
test_str = "Error processing. Reason: Exception: Job aborted due to failure: xxxxx (asasdasd): com.db.jdbc.exceptions.JDBCDriverException: DBTech JDBC: [2048]: column store error: search table error:  [123]"
match = re.search(regex, test_str)
if match:
    print(match.group().strip())

Upvotes: 2

Related Questions