Hooked
Hooked

Reputation: 88138

Using python to remove all strings and comments from python code

I'd like to count the uniqueness of my variable names in all the python code I've written. To do so, I need to strip out all the keywords and reserved words (which are known), the comments and the strings. For example, the following code:

''' long comment '''
for x in range(y, y+foo):
    print "Hello", 'world', '''lookout for the # tricky nest''', q # comment

should be stripped to the tokens of for, x, in, range, y, foo, print, q which can be further parsed using a known set of keywords. Is this possible using the python engine itself (maybe with the ast module)?

Upvotes: 1

Views: 585

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177610

This is my first time playing with the ast module, but it was relatively easy to collect all the object names referenced in a source:

import ast

class Visit(ast.NodeVisitor):
    def __init__(self):
        ast.NodeVisitor.__init__(self)
        self.s = set()
    def visit_Name(self,node):
        self.s.add(node.id)

with open('x.py') as f:
    a=ast.parse(f.read())
v = Visit()
v.visit(a)
print v.s

Where x.py was:

''' long comment '''
q=7
y=0
foo=10
for x in range(y,y+foo):
    print "Hello", 'world', '''lookout for the # tricky nest''', q # comment

Output:

set(['q', 'y', 'range', 'foo', 'x'])

Note that keywords aren't included already, but it does pick up the function name range.

Upvotes: 2

TigerhawkT3
TigerhawkT3

Reputation: 49318

If you're more concerned with getting the list of variables rather than how to strip out all strings, comments, etc., you could try something like:

for name in (set(locals()) | set(globals())):
    print(name)

to print anything that shows up in either the local dictionary or global dictionary. Use dir(myobject) to get the variables for myobject.

https://docs.python.org/3/library/functions.html

Upvotes: -1

Related Questions