CommentLuv
CommentLuv

Reputation: 1099

need an if else for regex

I have this regex to extract the name of a chatter in my iRC channel along with date and message capture groups

^\[(?:\d+)\-(?:\d+)(?:\-\d+) @ (\d+):\d+(?::\d+).\d+ (?:GMT|BST)\] (([^:]+)|\[[^\]]): ((?!\!).*)

it works on this chat line, it will work to give me 'bearwolf3' which is what I want as the 2nd capture group

[04-04-2017 @ 12:45:39.204 BST] bearwolf3: Break Fast

But if this line shows, I want to be able to extract a name of 'bladey2k14' from a relayed IRC message from my bot if it contains [ and ]

[04-04-2017 @ 12:45:22.338 BST] loonycrewbot: [bladey2k14]: tyt romani :)

so the 2nd capture would be 'bladey2k14'

I've seen if/then/else examples but it is not working for me to use and making my brain hurt!

can anyone modify my regex at the top to do this?

you can see it here . I want match 2 to have group 2 as bladey2k14 and group 3 as the message 'tyt romani'

Upvotes: 1

Views: 63

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626825

You may try using the following expression:

^\[\d+-\d+-\d+ @ (\d+):\d+:\d+\.\d+ (?:GMT|BST)\] (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]): ([\w\s]*)

See the regex demo

The branch reset group (?|...|...) in a PCRE regex allows using different groups inside it with the same numbering offset. So, (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) will match ([^:]+) and ([^\]]*) will capture the values into Group 2.

I also removed unnecessary non-capturing groups (like in (?:\d+) - the groups are neither quantified, nor do they contain any alternation operators).

The parts I changed are (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) and [\w\s]*:

  • (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) matches 1 of 2 alternatives:
    • ([^:]+)(?!:\s*\[[^\]]*]): 1 or more chars other than : captured into Group 2 (with ([^:]+)) not followed with :, 0+ whitespaces, [, 0+ chars other than ] and ] (with the negative lookahead (?!:\s*\[[^\]]*]))
    • | - or
    • [^:]+:\s*\[([^\]]*)] - 1+ chars other than :, followed with :, 0+ whitespaces, [, 0+ chars other than ] captured into (again) Group 2, and then ].

The [\w\s]* matches 0+ chars that are letters/digits/_/whitespace.

Upvotes: 1

Related Questions