Reputation: 57
I have a list of vertex indices for which I need to get the corresponding vertex properties. I can think of doing that by the following code:
[graph.vp["label"][ graph.vertex(i) ] for i in indices]
It works fine, but can I avoid the python loop altogether to achieve better speed?
The reason I'm asking this is that I found this particular code to be much slower than another one written entirely on python data structures. For example, this is what I'm doing:
for t in range(args.num_trials):
for b in budget:
train, test = train_test_split(n, train_size=b, random_state=t)
y_true = [graph.vp["label"][ graph.vertex(t) ] for t in test]
where the "graph" is a graph-tool graph object. On the other hand, Here is another code snippet:
for t in range(args.num_trials):
for b in budget:
train, test = train_test_split(n, train_size=b, random_state=t)
y_true = [graph.node_list[t].label for t in test]
where the graph is a custom defined python class consisting basic python data structures (e.g. node_list is a python list of Node class).
The issue here is, the later code runs much faster than the first one. The first one takes on average around 7 seconds whereas the later one takes only 0.07 seconds in my machine. Everything else is same for the two code snippets except the last line. I found here the author mentioned that,
graph-tool achieves greater performance by off-loading main loops to C++
So, I was wondering how can I off-load the loop in this particular scenario? And what is the explanation for this poor performance by graph-tool?
Upvotes: 3
Views: 1715
Reputation: 5261
If your property maps have scalar values, you should access the property maps as arrays:
label = g.vp["label"]
la = label.a # returns an array view
print(la[50]) # label for vertex 50
which means you can do:
label = g.vp["label"]
for t in range(args.num_trials):
for b in budget:
train, test = train_test_split(n, train_size=b, random_state=t)
y_true = label.a[test]
assuming that test
above is a Numpy array of integers.
If the value types are strings, then array access is not possible. Instead, you can speed things up by storing the property maps (instead of searching for them in the g.vp
dictionary every time) and using indices instead of Vertex
objects to query, i.e.
label = g.vp["label"]
for t in range(args.num_trials):
for b in budget:
train, test = train_test_split(n, train_size=b, random_state=t)
y_true = [label[t] for t in test]
The above is just basic Python optimization.
Upvotes: 3