Alejandro A
Alejandro A

Reputation: 1190

Take first word after a regex match

I am trying to extract some substring using regex from a string. I have as a parameter a word in my function, and the goal is to extract the very next word(my definition of word) after this match. I have tried lookbehind and some other logics, but I failed to obtain the results so any help is welcome.

As example, given the first case, I have as input in my function: **THttpServer**

23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)
23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)

Expected result: transportTCPChanged and transportUDPOpened for both cases.

Another case, I have as input CurrentUserConnection

23:25:16.622: INFO: CurrentUserConnection#1:RQ : subscribed(userID: 1)
23:25:16.622: INFO: CurrentUserConnection#8:RP : disconnected

Expected result: subscribed, disconnected.

Things I have tried (the lookbehind changes depending on the example) on Notepad++:

(?<=THttpServer)(\w+) : No matches (?<=THttpServer)(.*) : Obviously returns all the sentence, not expected match

I am bit confused, maybe it's not even possible? Or do I need some pre-processing?

Upvotes: 1

Views: 2074

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You need to match : after THttpServer and any non-word chars up to the word and match and capture it with (\w+).

E.g. you may use

THttpServer:\W*(\w+)

See the regex demo.

Details

  • THttpServer: - a literal substring
  • \W* - any 0+ non-word chars
  • (\w+) - Capturing group 1 (later accessible via m.group(1)): 1 or more word chars.

See the Python demo:

import re
strs = ['23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)',
        '23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)']

rx = re.compile(r'THttpServer:\W*(\w+)')
for s in strs:
    m = rx.search(s)
    if m:
        print("Found '{}' in '{}'.".format(m.group(1), s))

Output:

Found 'transportTCPChanged' in '23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)'.
Found 'transportUDPOpened' in '23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)'.

Upvotes: 1

Related Questions