Reputation: 53
I have a regex formula that I'm using to find specific patterns in my data. Specifically, it starts by looking for characters between "{}" brackets, and looks for "p. " and grabs the number after. I noticed that, in some instances, if there's not a "p. " value shortly after the brackets, it will continue to go through the next brackets and grab the number after.
For example, here is my sample data:
{Hello}, [1234] (Test). This is sample data used to answer a question {Hello2} [Ch.8 p. 87 gives more information about...
Here is my code:
\{(.*?)\}(.*?)p\. ([0-9]+)
I want it to return this only:
{Hello2} [Ch.8 p. 87
But it returns this:
{Hello}, [123:456] (Test). This is stample data used to answer a
question {Hello2} [Ch.8 p. 87
Is there a way to exclude strings that contain, let's say, "{"?
Upvotes: 5
Views: 5215
Reputation: 163207
Your pattern first matches from { till } and then matches in a non greedy way .*?
giving up matches until it can match a p
, dot space and 1+ digits.
It can do that because the dot can also match {}
.
You could use negated character classes [^{}]
to not match {}
\{[^{}]*\}[^{}]+p\. [0-9]+
Upvotes: 8
Reputation: 15120
Based on your example text, you may be able to simplify your regex a bit and avoid matching a second open curly brace before you match the page number (unless you have some other purpose for the capture groups). For example:
{[^{]*p\.\s\d+
{
match an open curly brace[^{]*
match all following characters except for another open curly brace p\.\s\d+
match "p" followed by period, space and one or more digitsUpvotes: 0
Reputation: 40024
Here's how you do it in Java. The regex should be fairly universal.
String test = "{Hello2} [Ch.8 p. 87 gives more information about..";
String pat = "(\\{.*?\\}.*p.*?\\d+)";
Matcher m = Pattern.compile(pat).matcher(test);
if (m.find()) {
System.out.println(m.group(1));
}
More specific ones can be provided if more is known about your data. For example, does each {} of information start on a separate line? What does the data look like and what do you want to ignore.
Upvotes: 0
Reputation: 27723
Your expression seems to be working fine, my guess is that we wish to only capture that desired output and non-capture others, which we can do so by slight modification of your original expression:
(?:[\s\S]*)(\{(.*?)\}(.*?)p\. [0-9]+)
or this expression:
(?:[\s\S]*)(\{.*)
jex.im visualizes regular expressions:
const regex = /(?:[\s\S]*)(\{.*)/gm;
const str = `{Hello}, [123:456] (Test). This is stample data used to answer a
question {Hello2} [Ch.8 p. 87`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 0