Bruce
Bruce

Reputation: 235

RegEx to detect if a line doesn't end in a semi colon

I'm trying to run through some code files and find lines that don't end in a semicolon.

I currently have this: ^(?:(?!;).)*$ from a bunch of Googling, and it works just fine. But now I want to expand on it so it ignores all the whitespace at the start or specific keywords like package or opening and closing braces.

The end goal is to take something like this:

package example
{
    public class Example
    {
        var i = 0

        var j = 1;

        // other functions and stuff
    }
}

And for the pattern to show me var i = 0 is missing a semi colon. That's just an example, the missing semi colon could be anywhere in class.

Any ideas? I've been fiddling for over an hour but no luck.

Thanks.

Upvotes: 7

Views: 10121

Answers (6)

Eric Leschinski
Eric Leschinski

Reputation: 154063

This is the regular expression line I'm using to highlight lines of Java code that don't end in semicolon and aren't one of the lines in java that aren't supposed to have a semicolon at the end... using vim's regular expression engine.

\(.\+[^; ]$\)\(^.*public.*\|.*//.*\|.*interface.*\|.*for.*\|.*class.*\|.*try.*\|^\s*if\s\+.*\|.*private.*\|.*new.*\|.*else.*\|.*while.*\|.*protected.*$\)\@<!
   ^          ^                                                                                                                                           ^
   |          |                                                                                                                 negative lookbehind feature 
   |          |
   |          2.  But not where such matches are preceeded by these keywords
   |
   |
   1. Group of at least some anychar preceeding a missing semicolon

Mnemonics for deciphering glyphs:

^          beginning of line
.*         Any amount of any char
+          at least one
[^ ... ]   everything but
$          end of line
\( ... \)  group
\|         delimiter
\@<!       negative lookbehind

Which roughly translates to:

Find me all lines that don't end in a semicolon and don't have any of the above keywords/expressions to the left of it. It's not perfect and probably doesn't hold up to obfuscated java, but for simple java programs it highlights the lines that should have semicolons at the end, but don't.

Image showing how this expression is working out for me:

enter image description here

Helpful link that helped me get the concepts I needed:

https://jbodah.github.io/blog/2016/11/01/positivenegative-lookaheadlookbehind-vim/

Upvotes: 1

calvin
calvin

Reputation: 347

The key to capturing this complicated concept in a regex is to first understand how your regular expression engine/interpreter handles the following concepts:

  1. positive lookahead
  2. negative lookahead
  3. positive lookbehind
  4. negative lookbehind

Then you can begin to understand how to capture what you want, but only in such cases where what's ahead and what's behind is exactly as you specify.

str.scan(/^\s*(?=\S)(?!package.+\n|public.+\n|\/\/|\{|\})(.+)(?<!;)\s*$/)

Upvotes: 1

Damian Powell
Damian Powell

Reputation: 8775

Try this:

^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$

When tested in PowerShell:

PS> (gc file.txt) -match '^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$'
        var i = 0 
PS> 

Upvotes: 1

Eliot Ball
Eliot Ball

Reputation: 728

You are trying to match lines that possibly begin with whitespace ^\s*, then don't have a particular set of words, for example (?!package|class), then have anything .* but then don't end in a semicolon (or a semicolon with whitespace after it) [^;]\s*.

^\s*(?!package|class).*?[^;]\s*$

Note that I added parentheses around a section of the regex.

Upvotes: 0

Eliot Ball
Eliot Ball

Reputation: 728

If you want a line that doesn't end in a semicolon you can ask for any amount anything .* followed by one character that isn't a semicolon [^;] followed possibly by some whitespace \s* by the end of the line $. So you have:

.*[^;]\s*$

Now if you don't want whitespace at the beginning you need to ask for the beginning of the line ^ followed by any character that isn't whitespace [^\s] followed by the regex from earlier:

^[^\s].*[^;]\s*$

If you don't want it to start with a keyword like package or, say, class, or whitespace you can ask for a character that isn't any of those three things. The regex that matches any of those three things is (?:\s|package|class) and the regex that matches anything except them them is (?!\s|package|class). Note the !. So you now have:

^(?!\s|package|class).*[^;]\s*$

Upvotes: 3

Petar Ivanov
Petar Ivanov

Reputation: 93060

For just line that don't end in a semicolon, this is simpler:

.*[^;]$

If you don't want lines starting with whitespace and ending with semicolon:

^[^ ].*[^;]$

Upvotes: 0

Related Questions