V_sqrt
V_sqrt

Reputation: 567

IndexError: The shape of the mask [...] at index 0 does not match the shape of the indexed tensor [...] at index 0

I am trying to use Torch for Label Propagation. I have a dataframe that looks like

ID   Target   Weight   Label
1      12       0.4      1
2      24       0.1      0
4      13       0.5      1
4      12       0.3      1
12     1        0.1      1
12     4        0.4      1
13     4        0.2      1
17     1        0.1      0

and so on.

I built the network as follows:

G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight']) 

and the adjacency matrix

adj_matrix = nx.adjacency_matrix(G).toarray()

I have two labels only, 0 and 1, and some data unlabelled. I created input tensors as follows:

# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(df['Labels'].tolist())

Trying to run the following code

# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
label_propagation.fit(labels_t) # this is causing the error

I have got the error: IndexError: The shape of the mask [196] at index 0 does not match the shape of the indexed tensor [207] at index 0. I checked the size size of adj_matrix_t.shape which is currently (207,207), while labels are 196. Do you know how I can fix this inconsistency?

Please see below the error track:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-42-cf4f88a4bb12> in <module>
      2 label_propagation = LabelPropagation(adj_matrix_t)
      3 print("Label Propagation: ", end="")
----> 4 label_propagation.fit(labels_t)
      5 label_propagation_output_labels = label_propagation.predict_classes()
      6 

<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
    100 
    101     def fit(self, labels, max_iter=1000, tol=1e-3):
--> 102         super().fit(labels, max_iter, tol)
    103 
    104 ## Label spreading

<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
     58             Convergence tolerance: threshold to consider the system at steady state.
     59         """
---> 60         self._one_hot_encode(labels)
     61 
     62         self.predictions = self.one_hot_labels.clone()

<ipython-input-1-54a7dbc30bd1> in _one_hot_encode(self, labels)
     43         self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
     44         self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
---> 45         self.one_hot_labels[unlabeled_mask, 0] = 0
     46 
     47         self.labeled_mask = ~unlabeled_mask

The below code is an example of what I would like to use for label propagation. It seems that the error is due to labels. There are nodes in my dataset not having labels (though in my example above I wrote for all the labels). Might it be the case that this is causing the error message?

Original code (for reference: https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb):

## Testing models on synthetic data

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Create caveman graph
n_cliques = 4
size_cliques = 5
caveman_graph = nx.connected_caveman_graph(n_cliques, size_cliques)
adj_matrix = nx.adjacency_matrix(caveman_graph).toarray()


# Create labels
labels = np.full(n_cliques * size_cliques, -1.)

# Only one node per clique is labeled. Each clique belongs to a different class.
labels[0] = 0
labels[size_cliques] = 1
labels[size_cliques * 2] = 2
labels[size_cliques * 3] = 3

# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(labels)

# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
print("Label Propagation: ", end="")
label_propagation.fit(labels_t)
label_propagation_output_labels = label_propagation.predict_classes()

# Learn with Label Spreading
label_spreading = LabelSpreading(adj_matrix_t)
print("Label Spreading: ", end="")
label_spreading.fit(labels_t, alpha=0.8)
label_spreading_output_labels = label_spreading.predict_classes()

# Plot graphs
color_map = {-1: "orange", 0: "blue", 1: "green", 2: "red", 3: "cyan"}
input_labels_colors = [color_map[l] for l in labels]
lprop_labels_colors = [color_map[l] for l in label_propagation_output_labels.numpy()]
lspread_labels_colors = [color_map[l] for l in label_spreading_output_labels.numpy()]

plt.figure(figsize=(14, 6))
ax1 = plt.subplot(1, 4, 1)
ax2 = plt.subplot(1, 4, 2)
ax3 = plt.subplot(1, 4, 3)

ax1.title.set_text("Raw data (4 classes)")
ax2.title.set_text("Label Propagation")
ax3.title.set_text("Label Spreading")

pos = nx.spring_layout(G)
nx.draw(G, ax=ax1, pos=pos, node_color=input_labels_colors, node_size=50)
nx.draw(G, ax=ax2, pos=pos, node_color=lprop_labels_colors, node_size=50)
nx.draw(G, ax=ax3, pos=pos, node_color=lspread_labels_colors, node_size=50)

# Legend
ax4 = plt.subplot(1, 4, 4)
ax4.axis("off")
legend_colors = ["orange", "blue", "green", "red", "cyan"]
legend_labels = ["unlabeled", "class 0", "class 1", "class 2", "class 3"]
dummy_legend = [ax4.plot([], [], ls='-', c=c)[0] for c in legend_colors]
plt.legend(dummy_legend, legend_labels)

plt.show()

Of course, if my example of dataset at the top of this post should not suit the original code because of the labels, if you could give me another example in order to understand how the labels (which determine the classes of nodes) in dataset should look like (even with missing values to be predicted), it would greatly appreciated it.

Upvotes: 2

Views: 4525

Answers (1)

Frodnar
Frodnar

Reputation: 2252

For other readers here, it seems like this is the implementation being asked about in this question.

The method you are using to try to predict labels works with labels for nodes, not edges. To visualize this, I plotted your example data and colored the plot by your Weight and Label columns (code to produce plot appended below) where Weight is the line thickness of the edge and Label is the color:

enter image description here

In order to use this method, you will need to produce data that looks like this, where each node (denoted by ID) gets exactly one node_label:

ID    node_label
1         1
2         0
4         1
12        1
13        1
17        0

To be clear, you will still need your original data above to build the network and the adjacency matrix, but you will have to decide some logical rule to turn your edge labels into node labels. Then once you predict your unlabeled nodes, you can reverse your rule to obtain edge labels if necessary.

It's not a strictly rigorous method, but it's practical and likely to yield somewhat sensible results if your data isn't just random noise.


Code appendix:

# Sample data network plot

import networkx as nx
import pandas as pd

data = {'ID': {0: 1, 1: 2, 2: 4, 3: 4, 4: 12, 5: 12, 6: 13, 7: 17},
        'Target': {0: 12, 1: 24, 2: 13, 3: 12, 4: 1, 5: 4, 6: 4, 7: 1},
        'Weight': {0: 0.4, 1: 0.1, 2: 0.5, 3: 0.3, 4: 0.1, 5: 0.4, 6: 0.2, 7: 0.1},
        'Label': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 0}}

df = pd.DataFrame.from_dict(data)

G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight', 'Label']) 

width = [20 * d['Weight'] for (u, v, d) in G.edges(data=True)]
edge_color = [d['Label'] for (u, v, d) in G.edges(data=True)]
nx.draw_networkx(G, width=width, edge_color=edge_color)

Upvotes: 1

Related Questions