Nasif Imtiaz Ohi
Nasif Imtiaz Ohi

Reputation: 1713

How to automatically detect code snippet from a text sample?

I'm doing some analysis on GitHub comments. But for that, I need to exclude the code samples and error messages from the comments automatically from a large set.

The other easier way to say this would be, I can keep only the English part of the comments. Although there are few libraries to detect the language of a sentence, there are few challenges in my case too. 1) the comment part does not always follow proper English grammar, 2) the code sample and error message mainly consist of English words too.

So what should be my best approach. The results don't need to be 100% accurate, I just want to know the best approach that can give me a satisfactory result at least. Any idea?

Upvotes: 1

Views: 1191

Answers (1)

dTanMan
dTanMan

Reputation: 137

This question is old, but my Google search led me to this question; so offering this answer in case anyone stumbles into this question, too.

Upvotes: 2

Related Questions