alejandro5042
alejandro5042

Reputation: 851

General purpose plain-text linting tool

I'm looking for a command-line tool where I can specify regex patterns (or similar) for certain file extensions (e.g. cs files, js files, xaml files) that can provide errors/warnings when run, like during a build. These would scan plain-text source code of all types.

I know there are tools for specific languages... I plan on using those too. This tool is for quick patterns we want to flag where we don't want to invest in writing a Rosyln rule, for example. I'd like to flag certain patterns or API usages in an easy way where anyone can add a new rule without thinking too hard. Often times we don't add rules because it is hard.

Features like source tokenization is bonus. Open-source / free is mega bonus.

Is there such a tool?

Upvotes: 4

Views: 555

Answers (2)

Peter Tillemans
Peter Tillemans

Reputation: 35331

If you want to go old-skool, you can dust-off Awk for this one.

It scans file line by line (for some configurable definition of line, with a sane default) cuts them in pieces (on whitespace IMMSMR) and applies a set of regexes and fires the code behind the matching regex. There are some conditions to match the beginning and end of a file to print headers/footers.

It seems to be what you want, but IMHO, a perl or ruby script is easier and has replaced AWK for me a long time ago. But it IS simple and straightforward for your use-case AFAICT.

Upvotes: 2

Ira Baxter
Ira Baxter

Reputation: 95316

Our Source Code Search Engine (SCSE) can do this.

SCSE lexes (using language-accurate tokenization including skipping language-specific whitespace but retaining comments) a set of source files, and then builds a token index for each token type. One can provide SCSE with token-based search queries such as:

  'if' '(' I '='

to search for patterns in the source code; this example "lints" C-like code for the common mistake of assigning a variable (I for "identifier") in an IF statement caused by accidental use of '=' instead of the intended '=='.

The search is accomplished using the token indexes to speed up the search. Typically SCSE can search millions of lines of codes in a few seconds, far faster than grep or other scheme that insists on reading the file content for each query. It also produces fewer false positives because the token checks are accurate, and the queries are much easier to write because one does not have to worry about white space/line breaks/comments.

A list of hits on the pattern can be logged or merely counted.

Normally SCSE is used interactively; queries produce a list of hits, and clicking on a hit produces a view of a page of the source text with the hit superimposed. However, one can also script calls on the SCSE.

SCSE can be obtained with langauge-accurate lexers for some 40 languages.

Upvotes: 0

Related Questions