Fenglun Zhang
Fenglun Zhang

Reputation: 21

For each sub-tree in an AST tree, create a corresponding list using Python

Recently, I was working on a project of detecting plagiarism in python code. I learned an interesting method from an literature (https://www.researchgate.net/publication/221554617). In this article, authors created a characteristic vector which is a numerical approximation of a particular sub-tree. Each vector consists of the sum of the node’s children’s vectors.

characteristic vectors with an AST

To do this, I need to create a 8-length list for each sub-tree in order to record information. The problem is that there must be a correspondence between the list and the subtree, otherwise it will be meaningless. Using a dictionary may not be a good way because I need to traverse the tree.

Ideally I want to get a new storage structure like newnode which has attributes like below:



#Points to the current node(or sub-tree)
newnode.node

#The list corresponding to the node
newnode.vector

Here is my attempt (you can ignore if you want):

import ast
import numpy as np


class NewNode:
    def __init__(self, node):
        self.node = node
        self.vector = np.zeros(8)


class Operation(ast.NodeTransformer):

    def generic_visit(self, node):
        ast.NodeTransformer.generic_visit(self, node)
        new_node = NewNode(node)
        print(type(node).__name__)
        # the code below is an example of record ast.Store node-type
        if isinstance(new_node.node, ast.Store):
            new_node.vector[5] += 1


source = \
    """
a = 10
b = "test"
print(a)
    """
node = ast.parse(source)
k = Operation()
k.generic_visit(node)

Upvotes: 2

Views: 451

Answers (1)

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

you are allowed to add fields to the nodes in place, that way you can just do node.vector = ... right on the node itself and then after visiting all nodes in the tree will have said vector with what ever logic you want applied. Note that generic_visit needs to return node or the visiter breaks stuff (returning nothing assumes you want to just destroy that subtree)

import ast
import numpy as np

characteristics_map = {
    # can just store the array for each node type, might be easier than case logic.
    ast.Store: [0, 0, 0, 0, 0, 1, 0, 0],
}

class Operation(ast.NodeTransformer):

    def generic_visit(self, node):
        # you are allowed to add arbitrary fields to python objects
        node.vector = np.zeros(8)
        ast.NodeTransformer.generic_visit(self, node)
        for a in ast.iter_child_nodes(node):
            # this node will be sum of all children nodes
            # as well as case logic below.
            node.vector += a.vector

        if type(node) in characteristics_map:
            node.vector += np.array(characteristics_map[type(node)])
        return node


source = \
    """
a = 10
b = "test"
print(a)
    """
node = ast.parse(source)
k = Operation()
k.generic_visit(node)
# there are 2 stores so the 5th element has a value of 2
print(node.vector)

Upvotes: 2

Related Questions