Nick King
Nick King

Reputation: 310

Stop at the first occurence in Regex

I have a text containing only one person details but a lot of different pets informations. I am looking for a way to select only the person details using Regex.

Here is what I've tried

TEXT :

# Person
---
Name: Nick King 
Age: 18
Speech: "Hello!! How are you? Me & you are different. I'm the #1"

# Pet = Dog
---
Name: Bill

# Pet = Cat
---
Name: Zacky

REGEX :

#\s*Person(\n|.)+(?=#\s*Pet)

Regex always goes and captures to till the last pet due to anychar (.) tag I've used.

How can I stop at first pet?

Assuming that "Dog" won't always be the first Pet in the list.

Upvotes: 3

Views: 113

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You are using (\n|.)+ which matches too much but is also very inefficient as it alternates between any character or a newline.

You could match # Person and repeat matching all the lines that do not start with # Pet

#\s*Person(?:\r?\n(?!#\s*Pet\b).*)*
  • #\s*Person Match # Person
  • (?: Non capturing group
    • \r?\n Match a newline
    • (?!#\s*Pet\b).* Match the whole line when not starting with # Pet
  • )* Close group and repeat 0+ times

Regex demo

Upvotes: 2

Robin James Kerrison
Robin James Kerrison

Reputation: 1757

Regex might not be the best solution to this sort of problem - there are YAML interpreters you could use.

If you're committed to using a regex, there is a simple solution: being ungreedy.

Locally Ungreedy

In your original regex, you had:

#\s*Person(\n|.)+(?=#\s*Pet)

In this, (\n|.)+ was matching as many characters as possible before conducting the Pet lookahead.

If you introduce ? after the + to make this group read (\n|.)+, you will get as few characters as possible before conducting the lookahead.

#\s*Person(\n|.)+?(?=#\s*Pet)

Regex101 describes +? as follows:

+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)

Globally Ungreedy

As well as operating a local ungreedy switch, you can globally set quantifiers to be ungreedy by using the U flag.

Note that this reverses greediness globally, so if you set the U flag as well as using +?, you will again be matching as many times as possible. Use one solution or the other.

Upvotes: 1

Related Questions