Reputation: 310
I have a text containing only one person details but a lot of different pets informations. I am looking for a way to select only the person details using Regex.
TEXT :
# Person
---
Name: Nick King
Age: 18
Speech: "Hello!! How are you? Me & you are different. I'm the #1"
# Pet = Dog
---
Name: Bill
# Pet = Cat
---
Name: Zacky
REGEX :
#\s*Person(\n|.)+(?=#\s*Pet)
Regex always goes and captures to till the last pet due to anychar (.) tag I've used.
How can I stop at first pet?
Assuming that "Dog" won't always be the first Pet in the list.
Upvotes: 3
Views: 113
Reputation: 163207
You are using (\n|.)+
which matches too much but is also very inefficient as it alternates between any character or a newline.
You could match # Person
and repeat matching all the lines that do not start with # Pet
#\s*Person(?:\r?\n(?!#\s*Pet\b).*)*
#\s*Person
Match # Person(?:
Non capturing group
\r?\n
Match a newline(?!#\s*Pet\b).*
Match the whole line when not starting with # Pet
)*
Close group and repeat 0+ timesUpvotes: 2
Reputation: 1757
Regex might not be the best solution to this sort of problem - there are YAML interpreters you could use.
If you're committed to using a regex, there is a simple solution: being ungreedy.
In your original regex, you had:
#\s*Person(\n|.)+(?=#\s*Pet)
In this, (\n|.)+
was matching as many characters as possible before conducting the Pet
lookahead.
If you introduce ?
after the +
to make this group read (\n|.)+
, you will get as few characters as possible before conducting the lookahead.
#\s*Person(\n|.)+?(?=#\s*Pet)
Regex101 describes +?
as follows:
+?
Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
As well as operating a local ungreedy switch, you can globally set quantifiers to be ungreedy by using the U flag.
Note that this reverses greediness globally, so if you set the U flag as well as using +?
, you will again be matching as many times as possible. Use one solution or the other.
Upvotes: 1