Reputation: 567
I have a dataset which looks like
ID Target Weight Score Scale_Cat Scale_num
0 A D 65.1 87 Up 1
1 A X 35.8 87 Up 1
2 B C 34.7 37.5 Down -2
3 B P 33.4 37.5 Down -2
4 C B 33.1 37.5 Down -2
5 S X 21.4 12.5 NA 9
This dataset consists of nodes (ID) and targets (neighbors) and it has been used as sample for testing label propagation. Classes/Labels are within the column Scale_num and can take values from -2 to 2 at step by one. The label 9 means unlabelled and it is the label that I would like to predict using label propagation algorithm.
Looking for some example on Google about label propagation, I have found this code useful (difference is in label assignment, since in my df I have already information on data which have labelled - from -2 to 2 at step by 1, and unlabelled, i.e. 9): https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb
However, trying to use my classes instead of (-1,0,1) as in the original code, I have got some errors. A user has provided some help here: RunTimeError during one hot encoding, for fixing a RunTimeError, unfortunately still without success.
In the answer provided on that link, 40 obs and labels are randomly generated.
import random
labels = list()
for i in range(0,40):
labels.append(list([(lambda x: x+2 if x !=9 else 5)(random.sample(classes,1)[0])]))
index_aka_labels = torch.tensor(labels)
torch.zeros(40, 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
The error I am getting, still a RunTimeError, seems to be still due to a wrong encoding. What I tried is the following:
import random
labels = list(df['Scale_num'])
index_aka_labels = torch.tensor(labels)
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
getting the error
---> 7 torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
RuntimeError: Index tensor must have the same number of dimensions as self tensor
For sure, I am missing something (e.g., the way to use classes and labels as well as src, which has never been defined in the answer provided in that link). The two functions in the original code which are causing the error are as follows:
def _one_hot_encode(self, labels):
# Get the number of classes
classes = torch.unique(labels) # probably this should be replaced
classes = classes[classes != -1] # unlabelled. In my df the unlabelled class is identified by 9
self.n_classes = classes.size(0)
# One-hot encode labeled data instances and zero rows corresponding to unlabeled instances
unlabeled_mask = (labels == -1) # In my df the unlabelled class is identified by 9
labels = labels.clone() # defensive copying
labels[unlabeled_mask] = 0
self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
self.one_hot_labels[unlabeled_mask, 0] = 0
self.labeled_mask = ~unlabeled_mask
def fit(self, labels, max_iter, tol):
self._one_hot_encode(labels)
self.predictions = self.one_hot_labels.clone()
prev_predictions = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
for i in range(max_iter):
# Stop iterations if the system is considered at a steady state
variation = torch.abs(self.predictions - prev_predictions).sum().item()
prev_predictions = self.predictions
self._propagate()
I would like to understand how to use in the right way my classes/labels definition and info from my df in order to run the label propagation algorithm with no errors.
Upvotes: 1
Views: 2036
Reputation: 3283
I suspect it's complaining about index_aka_labels lacking the singleton dimension. Note that in your example which works:
import random
labels = list()
for i in range(0,40):
labels.append(list([(lambda x: x+2 if x !=9 else 5)(random.sample(classes,1)[0])]))
index_aka_labels = torch.tensor(labels)
torch.zeros(40, 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
If you run index_aka_labels.shape
, it returns (40,1)
. When you just turn your pandas series into a tensor, however, it will return a tensor of shape (M)
(where M is the length of the series). If you simply run:
import random
labels = list(df['Scale_num'])
index_aka_labels = torch.tensor(labels)[:,None] #create another dimension
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
the error should disappear.
One more thing, you are not converting your labels into indices as you did in the top example. To do that, you can run:
import random
labels = list(df['Scale_num'])
index_aka_labels = torch.tensor(labels)[:,None] #create another dimension
index_aka_labels = index_aka_labels + 2 # labels are [-2,-1,0,1,2] and convert them to [0,1,2,3,4]
index_aka_labels[index_aka_labels==11] = 5 #convert label 9 to index 5
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)
Upvotes: 1