Reputation: 368
when I ran the StellarGraph's demo on graph classification using DGCNNs, I got the same result as in the demo.
However, when I tested what happens when I first shuffle the data using the following code:
shuffler = list(zip(graphs, graph_labels))
random.shuffle(shuffler)
graphs, graph_labels = zip(*shuffler)
The model didn't learn at all (accuracy of around 50% - just as data distribution).
Does anyone know why this happens? Maybe I shuffled in a wrong way? Or is it that the data should be unshuffled in the first place (also why? it doesn't make any sense)? Or is it a bug in StellarGraph's implementation?
Upvotes: 0
Views: 72
Reputation: 368
I found the problem. It wasn't anything to do with the shuffling algorithm, nor with StellarGraph's implementation. The problem was in the demo, at the following lines:
train_gen = gen.flow(
list(train_graphs.index - 1),
targets=train_graphs.values,
batch_size=50,
symmetric_normalization=False,
)
test_gen = gen.flow(
list(test_graphs.index - 1),
targets=test_graphs.values,
batch_size=1,
symmetric_normalization=False,
)
The problem was caused, specifically by train_graphs.index - 1
and test_graphs.index - 1
. The indices are already in the range between 0
to n
, so subtructing one from them would cause the graph data to "shift" one backwards, causing each data point to get the label of a different data point.
To fix this, simply change them to train_graphs.index
and test_graphs.index
without the -1
at the end.
Upvotes: 0