Antti
Antti

Reputation: 313

Capture between pattern of digits

I'm stuck trying to capture a structure like this:

1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå

I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:

match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå

Here's what I have tried:

\d+\:\d+.+

But that fails if there are word characters spanning two lines.

I'm using a javascript based regex engine.

Upvotes: 1

Views: 63

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use a regex based on a tempered greedy token:

/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g

The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.

As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like

/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g

See another regex demo.

Now, the () is turned into a pattern that matches strings linearly:

  • \D* - 0+ non-digit symbols
  • (?: - start of a non-capturing group matching zero or more sequences of:
    • \d - a digit that is...
    • (?!\d*:\d) - not followed with 0+ digits, : and a digit
    • \D* - 0+ non-digit symbols
  • )* - end of the non-capturing group.

Upvotes: 1

Felipe Quirós
Felipe Quirós

Reputation: 440

you can use or not the ñ-Ñ, but you should be ok this way

\d+?:\d+? [a-zñA-ZÑ ]*

Edited:

If you want to include the break lines, you can add the \n or \r to the set,

\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]* 

Give it a try ! also tested in https://regex101.com/

for more chars: ^[a-zA-Z0-9!@#\$%\^\&*)(+=._-]+$

Upvotes: 0

Related Questions