gowrath
gowrath

Reputation: 3224

How node2vec works

I have been reading about the node2vec embedding algorithm and I am a little confused how it works.

For reference, node2vec is parametrised by p and q and works by simulating a bunch of random walks from nodes and just running word2vec embeddings on these walks as "sentences". By setting p and q in different ways, you can get more BFS or more DFS type random walks in the simulataion phase, capturing different network structure in the embedding.

Setting q > 1 gives us more BFS behaviour in that the samples of walks comprise of nodes within a small locality. The thing I am confused about is that the paper says this is equivalent to embedding nodes with similar structural properties close to each other.

I don't quite understand how that works. If I have two separate say star/hub structured nodes in my network that are far apart, why would embedding based on the random walks from those two nodes put those two nodes close together in the embedding?

Upvotes: 5

Views: 1230

Answers (2)

TandemElephant
TandemElephant

Reputation: 11

My understanding of the two sampling strategies goes like this:

  • DFS: for each node (a) the walk explores a wide context, containing not just the immediate neighbors (b), but also nodes further away (c). When optimizing the embedding and trying to get nodes closer which have similar context, the optimizer has to consider not just the relation of (a)-(b), but also (b)-(c), and so on. This is the same as trying to place nodes so that their distance in the network is conserved (each node trying to find its place based on a wide context).

  • BFS: for each node (a) the walk only explores the local context, but it does that extensively, so probably all neighbors (b1, b2, ...) will be included (and maybe some 2nd neighbors). Imagine trying to find a nodes place in the embedding space, while only having information on their neighbors. Nodes, that have similarly embedded neighbors should be close, e.g. dangling nodes with only 1 neighbor (and thus respective walk containing the source node many times), or nodes with two neighbors which have high degrees (i.e. a bridges connecting two hubs). So by only knowing the local information the embedding will not optimize for global distances, thus the result is not based on the actual graph structure, but rather on local patterns (called structural equivalence in the paper, just to make it confusing)

BUT!!! I tried reproducing the results for the network of Les Miserables with the parameters used in the original paper (p=1 q=0.5 and p=1 q=2), and couldn't get node2vec to do this 2nd type structural embedding thing. There is something fishy going on, as others also struggle with getting node2vec to embed structurally, here is a paper on it. If someone was able to reproduce their results please tell me how :)

Upvotes: 1

bluesummers
bluesummers

Reputation: 12627

This question has occupied my mind also after reading the article, and more so after empirically seeing that it indeed does that.

I assume you refer to the part in the paper showing the following diagram, states that u and s6 resulting embeddings will be quite similar in the space: enter image description here

To understand why this indeed happens, first we must understand how the skip-gram model embeds information, which is the mechanism that consumes the random walks. The skip-gram model eventually generates similar embeddings for tokens that can appear in similar context - but what does that really mean from the skip-gram model perspective? If we would like to embed the structural equivalence we would favor a DFS-like walk (and additionally we would have to use an adequate window size for the skip-gram model). So random walks would look like

1. s1 > u > s4 > s5 > s6 > s8
2. s8 > s6 > s5 > s4 > u > s1
3. s1 > s3 > u > s2 > s5 > s6
4. s7 > s6 > s5 > s2 > u > s3
.
.
n. .....

What will happen is that there would be many walks, where u and s6 appear in walks where their surroundings are the same. Since their surroundings will be similar it means that their context is similar and as stated similar context == similar embeddings.

One might further ask what about order? Well order doesn't really matter, since the skip-gram model uses the window size to generate pairs out of every sentence, in the link I provided you can further understand this concept.

So bottom line, if you can create walks that will create similar context for two nodes, their embeddings will be similar.

Upvotes: 3

Related Questions