Reputation: 21
Recently, I was working on a project of detecting plagiarism in python code. I learned an interesting method from an literature (https://www.researchgate.net/publication/221554617). In this article, authors created a characteristic vector which is a numerical approximation of a particular sub-tree. Each vector consists of the sum of the node’s children’s vectors.
characteristic vectors with an AST
To do this, I need to create a 8-length list for each sub-tree in order to record information. The problem is that there must be a correspondence between the list and the subtree, otherwise it will be meaningless. Using a dictionary may not be a good way because I need to traverse the tree.
Ideally I want to get a new storage structure like newnode which has attributes like below:
#Points to the current node(or sub-tree)
newnode.node
#The list corresponding to the node
newnode.vector
Here is my attempt (you can ignore if you want):
import ast
import numpy as np
class NewNode:
def __init__(self, node):
self.node = node
self.vector = np.zeros(8)
class Operation(ast.NodeTransformer):
def generic_visit(self, node):
ast.NodeTransformer.generic_visit(self, node)
new_node = NewNode(node)
print(type(node).__name__)
# the code below is an example of record ast.Store node-type
if isinstance(new_node.node, ast.Store):
new_node.vector[5] += 1
source = \
"""
a = 10
b = "test"
print(a)
"""
node = ast.parse(source)
k = Operation()
k.generic_visit(node)
Upvotes: 2
Views: 451
Reputation: 21453
you are allowed to add fields to the nodes in place, that way you can just do node.vector = ...
right on the node itself and then after visiting all nodes in the tree will have said vector with what ever logic you want applied. Note that generic_visit
needs to return node
or the visiter breaks stuff (returning nothing assumes you want to just destroy that subtree)
import ast
import numpy as np
characteristics_map = {
# can just store the array for each node type, might be easier than case logic.
ast.Store: [0, 0, 0, 0, 0, 1, 0, 0],
}
class Operation(ast.NodeTransformer):
def generic_visit(self, node):
# you are allowed to add arbitrary fields to python objects
node.vector = np.zeros(8)
ast.NodeTransformer.generic_visit(self, node)
for a in ast.iter_child_nodes(node):
# this node will be sum of all children nodes
# as well as case logic below.
node.vector += a.vector
if type(node) in characteristics_map:
node.vector += np.array(characteristics_map[type(node)])
return node
source = \
"""
a = 10
b = "test"
print(a)
"""
node = ast.parse(source)
k = Operation()
k.generic_visit(node)
# there are 2 stores so the 5th element has a value of 2
print(node.vector)
Upvotes: 2