Reputation: 3890

regex not capturing correctly

~\[(.*)\] (.*): (.*)~s

The desired behavior is to capture the text between [ and ] (the first occurrence of both). So in this case:

[7/25/2015 8:40:18 PM] Ghost: [Saturday, July 25, 2015 8:13 PM] Nathan: 

<<< Quoted text

7/25/2015 8:40:18 PM should be captured. However, as you can see in the regex101 example, the captured text is 7/25/2015 8:40:18 PM] Ghost: [Saturday, July 25, 2015 8:13 PM.

I have no idea how this is happening. Any help is appreciated! Thanks!

Upvotes: 0

Answers (3)

Wiktor Stribiżew

Reputation: 626689

To capture the first occurrence of text inside [...] can be achieved with a much more simplified regex:

\[([^]]*)]

See demo

Judging by the sample data, there cannot be any nested [...] sequences, and there should be no stray ] inside the square brackets. Thus, a negated character class looks best here.

Here is what the regex means:

\[- match literal [
([^]]*) - match and capture into Group 1 0 or more characters other than ] (note we do not have to escape ] inside a character class at the beginning of range)
] - matches a literal ] (note again that this closing square bracket is unambiguous since there is an escaped first [ before it).

This will match the first occurrence without g option, and you can get this behavior using appropriate functions/methods of your programming language.

If you need to match this first occurrence in the beginning of a string/line, use an anchor ^ (to enforce multiline mode you will need /m modifier):

^\[([^]*&^]*)]

See another demo

Upvotes: 2

Aydin

Reputation: 15284

This will extract the values

Sample use

7/25/2015 8:40:18 PM Ghost: Saturday, July 25, 2015 8:13 PM Nathan:

With this

(\d+\/\d+\/+\d+ \d+\:\d+\:\d+ [A|P]M)[^:]*: ([A-Z][a-z]+\, [A-Z][a-z]* \d+, \d+ \d+:\d+ [A|P]M)

Upvotes: 0

CaptainCap

Reputation: 334

You need to make your .* non greedy to stop at the first match:

\[(.*?)\] (.*?): (.*)

Upvotes: 0

regex not capturing correctly

Answers (3)

Related Questions