UnderpoweredNinja
UnderpoweredNinja

Reputation: 127

Detect programming language of a snippet using Pygments

I'm using the guess_lexer() method of Pygments library to identify the source code in a snippet:

This is how I'm using it right now:

from pygments.lexers import guess_lexer
text = "string containing source code"
lexer_subclass = guess_lexer(text)
print str(lexer_subclass)

And based on the language present in the text variable, it will return something like:

<pygments.lexers.PythonLexer>

What I want is only the PythonLexer part. I'm aware that I can get it using string manipulation, but it feels hacky. I want to do it in the correct way.

So I tried to see what Pygment's doing internally and found this method which is responsible for outputting the lexer name:

def __repr__(self):
    if self.options:
        return '<pygments.lexers.%s with %r>' % (self.__class__.__name__,
                                                 self.options)
    else:
        return '<pygments.lexers.%s>' % self.__class__.__name__

Sure enough, if I modify it to return only self.__class__.__name__, I'll get what I want, but that doesn't feel right.

How can I get what I want? Maybe inheriting the class and then overriding the function or something? Any ideas will be appreciated.

Upvotes: 5

Views: 1300

Answers (1)

UnderpoweredNinja
UnderpoweredNinja

Reputation: 127

Turns out the solution was simple. I simply had to use the following:

guess_lexer(text).name

Upvotes: 4

Related Questions