Reputation: 179
I am trying to run a link prediction using HinSAGE in the stellargraph python package.
I have a network of people and products, with edges from person to person (KNOWs) and person to products (BOUGHT). Both people and products got a property vector attached, albeit a different one from each type (Persons vector is 1024 products is 200). I am trying to create a link prediction algorithm from person to product based on all the information in the network. The reason for me for using HinSAGE is the option for inductive learning.
I have the code below, and I thought I was doing it similar to the examples
https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link-prediction.html
but I keep getting "nan" as my output predictions, anyone got a suggestion to what I can try?
import networkx as nx
import pandas as pd
import numpy as np
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification, link_regression
from sklearn.model_selection import train_test_split
graph.info()
#StellarGraph: Undirected multigraph
# Nodes: 54226, Edges: 259120
#
# Node types:
# products: [45027]
# Features: float32 vector, length 200
# Edge types: products-BOUGHT->person
# person: [9199]
# Features: float32 vector, length 1024
# Edge types: person-KNOWS->person, person-BOUGHT->product
#
# Edge types:
# person-KNOWS->person: [246131]
# Weights: all 1 (default)
# Features: none
# person-BOUGHT->product: [12989]
# Weights: all 1 (default)
# Features: none
import networkx as nx
import pandas as pd
import numpy as np
import os
import random
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification
from stellargraph.data import UniformRandomWalk
from stellargraph.data import UnsupervisedSampler
from sklearn.model_selection import train_test_split
from stellargraph.layer import HinSAGE, link_regression
edge_splitter_test = EdgeSplitter(graph)
graph_test, edges_test, labels_test = edge_splitter_test.train_test_split(
p=0.1, method="global", edge_label="BOUGHT"
)
edge_splitter_train = EdgeSplitter(graph_test, graph)
graph_train, edges_train, labels_train = edge_splitter_train.train_test_split(
p=0.1, method="global", edge_label="BOUGHT"
)
num_samples = [8, 4]
G = graph
batch_size = 20
epochs = 20
generator = HinSAGELinkGenerator(
G, batch_size, num_samples, head_node_types=["person", "product"]
)
train_gen = generator.flow(edges_train, labels_train, shuffle=True)
test_gen = generator.flow(edges_test, labels_test)
hinsage_layer_sizes = [32, 32]
assert len(hinsage_layer_sizes) == len(num_samples)
hinsage = HinSAGE(
layer_sizes=hinsage_layer_sizes, generator=generator, bias=True, dropout=0.0
)
# Expose input and output sockets of hinsage:
x_inp, x_out = hinsage.in_out_tensors()
# Final estimator layer
prediction = link_classification(
output_dim=1, output_act="sigmoid", edge_embedding_method="concat"
)(x_out)
model = Model(inputs=x_inp, outputs=prediction)
model.compile(
optimizer=optimizers.Adam(),
loss=losses.binary_crossentropy,
metrics=["acc"],
)
history = model.fit(train_gen, epochs=epochs, validation_data=test_gen, verbose=2)
Upvotes: 0
Views: 849
Reputation: 179
So I found the problem, might be useful for others. If there is any node containing missing data, the thing will just produce NAs. Especially dangerous if you create your graph by joining pandas dataframes, I had a typo in one file that was integrated and led to the problem.
Upvotes: 2