Reputation: 31212
I am writing a simple Java
source file parser in Python
. The main objective is to extract a list of method declarations. A method starts with public|private|protected
(I assume there are no friendly
methods without an access modifier, which is acceptable in my code base) and ends with a {
but can't contain ;
(could be multiple lines).
So my current regex
pattern looks like:
((public|private|protected).*\n*.*?({|;))
I am not sure how to say the entire match group can't contain ;
so I was trying to say get me something that ends with either {
or ;
, whichever comes first, non-greedy. However, that doesn't work and here is a chunk where it fails:
private static final AtomicInteger refCount = new AtomicInteger(0);
protected int getSomeVar() {
You can see that there is a variable declaration before the method declaration that starts with private
but it does not have a {
. So this is returned as one match and I wanted to have it as two matches, then I would be discarding the variable declaration in separate non-regex logic. But if you know how to exclude a ;
before {
, that would work too.
Essentially, how do I tell in a Python regex expression that a certain character (or a sub pattern) must not occur within the main pattern?
Upvotes: 1
Views: 204
Reputation: 31212
This finally worked:
((public|private|protected)[^;{]*?{)
Notice how I had to specify to exclude both ;
and {
before the first {
Upvotes: 1
Reputation: 189317
You can use a negated character class to say "any character except (newline or) left brace or semicolon".
((public|private|protected)[^;{]*\n*[^;{]*?({|;))
Upvotes: 2