amphibient
amphibient

Reputation: 31212

Python regex pattern definition excluding a character

I am writing a simple Java source file parser in Python. The main objective is to extract a list of method declarations. A method starts with public|private|protected (I assume there are no friendly methods without an access modifier, which is acceptable in my code base) and ends with a { but can't contain ; (could be multiple lines).

So my current regex pattern looks like:

((public|private|protected).*\n*.*?({|;))

I am not sure how to say the entire match group can't contain ; so I was trying to say get me something that ends with either { or ;, whichever comes first, non-greedy. However, that doesn't work and here is a chunk where it fails:

private static final AtomicInteger refCount = new AtomicInteger(0);

protected int getSomeVar() {

You can see that there is a variable declaration before the method declaration that starts with private but it does not have a {. So this is returned as one match and I wanted to have it as two matches, then I would be discarding the variable declaration in separate non-regex logic. But if you know how to exclude a ; before {, that would work too.

Essentially, how do I tell in a Python regex expression that a certain character (or a sub pattern) must not occur within the main pattern?

Upvotes: 1

Views: 204

Answers (2)

amphibient
amphibient

Reputation: 31212

This finally worked:

((public|private|protected)[^;{]*?{)

Notice how I had to specify to exclude both ; and { before the first {

Upvotes: 1

tripleee
tripleee

Reputation: 189317

You can use a negated character class to say "any character except (newline or) left brace or semicolon".

((public|private|protected)[^;{]*\n*[^;{]*?({|;))

Upvotes: 2

Related Questions