malibu2
malibu2

Reputation: 53

Regex to exclude specific characters

I have a regex formula that I'm using to find specific patterns in my data. Specifically, it starts by looking for characters between "{}" brackets, and looks for "p. " and grabs the number after. I noticed that, in some instances, if there's not a "p. " value shortly after the brackets, it will continue to go through the next brackets and grab the number after.

For example, here is my sample data:

{Hello}, [1234] (Test). This is sample data used to answer a question {Hello2} [Ch.8 p. 87 gives more information about...

Here is my code:

\{(.*?)\}(.*?)p\. ([0-9]+)

I want it to return this only:

{Hello2}  [Ch.8 p. 87

But it returns this:

{Hello},  [123:456] (Test).  This is stample data used to answer a
question {Hello2}  [Ch.8 p. 87

Is there a way to exclude strings that contain, let's say, "{"?

Upvotes: 5

Views: 5215

Answers (4)

The fourth bird
The fourth bird

Reputation: 163207

Your pattern first matches from { till } and then matches in a non greedy way .*? giving up matches until it can match a p, dot space and 1+ digits.

It can do that because the dot can also match {}.

You could use negated character classes [^{}] to not match {}

\{[^{}]*\}[^{}]+p\. [0-9]+

Regex demo

Upvotes: 8

benvc
benvc

Reputation: 15120

Based on your example text, you may be able to simplify your regex a bit and avoid matching a second open curly brace before you match the page number (unless you have some other purpose for the capture groups). For example:

{[^{]*p\.\s\d+
  • { match an open curly brace
  • [^{]* match all following characters except for another open curly brace
  • p\.\s\d+ match "p" followed by period, space and one or more digits

Upvotes: 0

WJS
WJS

Reputation: 40024

Here's how you do it in Java. The regex should be fairly universal.

      String test = "{Hello2} [Ch.8 p. 87 gives more information about..";
      String pat = "(\\{.*?\\}.*p.*?\\d+)";
      Matcher m = Pattern.compile(pat).matcher(test);
      if (m.find()) {
         System.out.println(m.group(1));
      }

More specific ones can be provided if more is known about your data. For example, does each {} of information start on a separate line? What does the data look like and what do you want to ignore.

Upvotes: 0

Emma
Emma

Reputation: 27723

Your expression seems to be working fine, my guess is that we wish to only capture that desired output and non-capture others, which we can do so by slight modification of your original expression:

(?:[\s\S]*)(\{(.*?)\}(.*?)p\. [0-9]+)

Demo 1

or this expression:

(?:[\s\S]*)(\{.*)

Demo 2

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

const regex = /(?:[\s\S]*)(\{.*)/gm;
const str = `{Hello},  [123:456] (Test).  This is stample data used to answer a
question {Hello2}  [Ch.8 p. 87`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Upvotes: 0

Related Questions