Reputation: 2049
I am intending to use Graph Kernel to perform similarity measure between different computer programs. During compilation of computer program, they are represented as some form of graphs as depicted in the figure with control flow.
The figure is extracted from the paper "Using Graph based program characterization for predictive modeling"
I am intending to use Graph kernels and for this purpose I am seeking help in how these graphs could be encoded (format) with each node (basic block) containing feature vector (characteristics of that particular basic block) so that they can be fed to the kernel to calculate pre-computed matrices.
Upvotes: 2
Views: 613
Reputation: 66775
Kernel functions defined for non-numerical objects like strings or graphs were introduced mostly to avoid encoding of this structures. The core idea is to compute the kernel value directly on objects from the non-numerical space, like in this example - graphs. Your particular example is an instance of labeled verticle graph (no labels on edges), so you can simply use the graph kernel for such structures. In the Graph Kernels paper it is introduced as a edge labeled structure, but change from edge labeling to vertice labeling is quite natural (and already done in other papers). So what remains is computation of similarity between particular vertice v_i
and v_j
. In oridignal paper we simply have a matrix W
(responsible for expressing "similarity" of particular edge labelings) so analogusly you can compute some kind of similarity between vertices feature vectors (there are dozens possibilities, and choice of particular one is heavily data dependent, you can experiment with cosine similarity, hamming distance, MSE etc.), but the core idea remains the same. You first compute the vertice-vertice similarity matrix to use in the product graph and then simply apply corresponding graph kernel on your data. It is not simple solution, but I don't think that there exists the simple (good) one. Graph kernels are quite young objects (introduced 11 years ago) to work with really complex objects (and your particular problem is a great example of very complex object for classification). You should have in mind, that it can be very computationally expensive to use kernel methods on your graphs, and so it may be better idea to work with some simplier models (working on some simple features of your graphs instead of whole "raw" data).
Upvotes: 2