Thomas
Thomas

Reputation: 10105

Source Code detection with Java

Any idea how to detect a source code (Java, C#, SQL and so on) in a text file with Java without looking at the file extension or using an extraordinary long, selfmade regular expression?

Maybe there are some tools doing this work already?

Upvotes: 6

Views: 1081

Answers (4)

Tomasz Nurkiewicz
Tomasz Nurkiewicz

Reputation: 340883

Linguist

We use this library at GitHub to detect blob languages, highlight code, ignore binary files, suppress generated files in diffs and generate language breakdown graphs.

Unfortunately it is written in Ruby, maybe JRuby can handle it?

Upvotes: 3

mikek
mikek

Reputation: 1555

No, without using a syntax analyzer (which pretty much is the complex variant of a regexp), there is no way of seeing the difference between a source code file and a regular text file. The difference between source code and text is as simple as a one-letter-typo, if you think about it.

Upvotes: 1

kan
kan

Reputation: 28981

There is an old library, http://sourceforge.net/projects/jmimemagic/ try it, I hope it could give satisfactory results.

Upvotes: 1

Lajos Arpad
Lajos Arpad

Reputation: 76817

You should find a minimalistic amount of keywords and define some logical rules. If you define the right rules, the regular expression defined by them will be not extraordinary big. Note, that the fewer keywrods and rules you have, the bigger is the probability of a mistake (SourceCode = true for a file which is not a source code, SourceCode = false for a file which is a source code). Also, the more keywords and rules you have the more time is needed to check whether a file is a source code or not.

Upvotes: 1

Related Questions