Reputation: 1292
I'm trying to get record1
, record2
, record3
from text:
"Record1 ANY TEXT 123 4 5 Record2 ANOTHER TEXT 90-8098 Record3 MORE TEXT ASD 123"
Each record appears ONE or ZERO times. I use pattern:
(Record1.*)?(Record2.*)?(Record3.*)?
If each record appears,
matcher.group(1) == "Record1 ANY TEXT 123 4 5 Record2 ANOTHER TEXT 90-8098 Record3 MORE TEXT ASD 123"
matcher.group(2) == null
matcher.group(3) == null
If I use pattern:
(Record1.*)(Record2.*)(Record3.*)
matcher.group(1) == "Record1 ANY TEXT 123 4 5 "
matcher.group(2) == "Record2 ANOTHER TEXT 90-8098 "
matcher.group(3) == "Record3 MORE TEXT ASD 123"
It's exatly what I want, but each record can appear zero time and this regexp not suitable
What pattern should I use?
Upvotes: 7
Views: 8316
Reputation:
If your text is tightly packed and is composed of just Record
, why not use split
(if Java calls it split).
split regex:
# "(?:(?!Record)[\\S\\s])*(Record[\\S\\s]*?)(?=Record|$(?!\\n))"
(?:
(?! Record )
[\S\s]
)*
( Record [\S\s]*? )
(?=
Record
| $ (?! \n )
)
Upvotes: 0
Reputation: 30273
You want to make your quantifiers non-greedy, and you want to use anchors:
^.*?(Record1.*?)?(Record2.*?)?(Record3.*?)?$
In your original expression, your .*
was basically consuming everything to the end of the string, because that's how regular expressions behave, by default (called greedy matching). Since the second and third groups were optional, there was no reason for the engine not to simply match everything with that first .*
—it was the most efficient match.
By adding a ?
after any quantifier, e.g. *?
or +?
or ??
or {m,n}?
, you instruct the engine to match as little as possible, i.e. invoke non-greedy matching.
So, why the anchors? Well, if you invoke non-greedy matching, the engine's going to try to match as little as possible. So, it'd match nothing, since all your groups are optional! By forcing the whole expression to match the beginning, ^
, as well as the end, $
, you force to regular expression to find some way to match as few characters as possible via .*?
, but still match as much as needed to get all the details.
Upvotes: 8