Anand
Anand

Reputation: 7764

Finding patterns in source code

If I wanted to learn about pattern recognition in general what would be a good place to start (recommend a book)?

Also, does anybody have any experience/knowledge on how to go about applying these algorithms to find abstraction patterns in programs? (repeated code, chunks of code that do the same thing, but in slightly different ways, etc.)

Thanks

Edit: I don't mind mathematically intensive books. In fact, that would be a good thing.

Upvotes: 2

Views: 3164

Answers (8)

Ira Baxter
Ira Baxter

Reputation: 95392

One kind of pattern is code that has been cloned by copy and paste methods. See CloneDR for a tool that automatically finds such code in spite of variations in layout and even changes in the body of the clone, by comparing abstract syntax trees for the language in question.

CloneDR works with a variety of langauges: C, C++, C#, Java, JavaScript, PHP, COBOL, Python, ... The website shows clone detection reports for a variety of programming languages.

Upvotes: 0

TWith2Sugars
TWith2Sugars

Reputation: 3434

This is specific to .Net and visual studio, but it finds duplicate code in your project. It does report some false positives I've found but it could be a good place to start.

Clone Detective

Upvotes: 0

mskfisher
mskfisher

Reputation: 3406

Another project you can look into is Duplo - it's an open-source/GPL project, so you can pore over their approach by grabbing the code from SourceForge.

Upvotes: 0

Ian G
Ian G

Reputation: 10929

If you are reasonably mathematically confident then either of Chris Bishop's books "Pattern Recognition and Machine Learning" or "Neural Networks for Pattern Recognition" are very good for learning about pattern recognition.

Upvotes: 2

jheriko
jheriko

Reputation: 3113

It helps if you have access to the parse tree generated during compilation. This way you can look for pieces of the tree which are similar, ignoring the nodes which are deeper than what you are looking at, this way you can pick out e.g. nodes which multiply together two sub-expressions, ignoring the contents of the sub-expressions. You can apply the same logic to a collection of nodes, e.g. you want to find a multiplication of two sub-expressions where those two sub-expressions are additions of more sub-expressions. You first look for multiplies, then check if the two nodes underneath the multiply are additions, ignoring anything any deeper.

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328724

Other interesting projects are PMD and Eclipse.

Eclipse uses AST (abstract syntax trees) for all source code in any project. Tools can then register for certain types of ASTs (like Java source) and get a preprocessed view where they can add additional information (like links to documentation, error markers, etc).

Upvotes: 0

krosenvold
krosenvold

Reputation: 77201

If you're working in one of the supported languages, IntelliJ idea has a really smart structural search and replace that would fit your problem.

Upvotes: 0

joel.neely
joel.neely

Reputation: 30943

I'd suggest looking at the code of some open source project (e.g. FindBugs or SIM) that does the kind of thing you're talking about.

Upvotes: 0

Related Questions