V_sqrt
V_sqrt

Reputation: 567

Index tensor must have the same number of dimensions as self tensor

I have a dataset which looks like

ID  Target  Weight    Score   Scale_Cat   Scale_num
0   A   D   65.1       87        Up  1
1   A   X   35.8       87        Up  1
2   B   C   34.7       37.5    Down    -2
3   B   P   33.4       37.5    Down    -2
4   C   B   33.1       37.5    Down    -2
5   S   X   21.4       12.5    NA  9

This dataset consists of nodes (ID) and targets (neighbors) and it has been used as sample for testing label propagation. Classes/Labels are within the column Scale_num and can take values from -2 to 2 at step by one. The label 9 means unlabelled and it is the label that I would like to predict using label propagation algorithm. Looking for some example on Google about label propagation, I have found this code useful (difference is in label assignment, since in my df I have already information on data which have labelled - from -2 to 2 at step by 1, and unlabelled, i.e. 9): https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb However, trying to use my classes instead of (-1,0,1) as in the original code, I have got some errors. A user has provided some help here: RunTimeError during one hot encoding, for fixing a RunTimeError, unfortunately still without success.
In the answer provided on that link, 40 obs and labels are randomly generated.

import random
labels = list()
for i in range(0,40):
    labels.append(list([(lambda x: x+2 if x !=9 else 5)(random.sample(classes,1)[0])]))  

index_aka_labels = torch.tensor(labels)
torch.zeros(40, 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

The error I am getting, still a RunTimeError, seems to be still due to a wrong encoding. What I tried is the following:

import random
labels = list(df['Scale_num']) 


index_aka_labels = torch.tensor(labels)
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

getting the error

---> 7 torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

RuntimeError: Index tensor must have the same number of dimensions as self tensor

For sure, I am missing something (e.g., the way to use classes and labels as well as src, which has never been defined in the answer provided in that link). The two functions in the original code which are causing the error are as follows:

def _one_hot_encode(self, labels):
    # Get the number of classes
    classes = torch.unique(labels) # probably this should be replaced
    classes = classes[classes != -1] # unlabelled. In my df the unlabelled class is identified by 9 
    self.n_classes = classes.size(0)

    # One-hot encode labeled data instances and zero rows corresponding to unlabeled instances
    unlabeled_mask = (labels == -1) # In my df the unlabelled class is identified by 9 
    labels = labels.clone()  # defensive copying
    labels[unlabeled_mask] = 0
    self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
    self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
    self.one_hot_labels[unlabeled_mask, 0] = 0

    self.labeled_mask = ~unlabeled_mask

def fit(self, labels, max_iter, tol):
    
    self._one_hot_encode(labels)

    self.predictions = self.one_hot_labels.clone()
    prev_predictions = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)

    for i in range(max_iter):
        # Stop iterations if the system is considered at a steady state
        variation = torch.abs(self.predictions - prev_predictions).sum().item()
        

        prev_predictions = self.predictions
        self._propagate()

I would like to understand how to use in the right way my classes/labels definition and info from my df in order to run the label propagation algorithm with no errors.

Upvotes: 1

Views: 2036

Answers (1)

jhso
jhso

Reputation: 3283

I suspect it's complaining about index_aka_labels lacking the singleton dimension. Note that in your example which works:

import random
labels = list()
for i in range(0,40):
    labels.append(list([(lambda x: x+2 if x !=9 else 5)(random.sample(classes,1)[0])]))  

index_aka_labels = torch.tensor(labels)
torch.zeros(40, 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

If you run index_aka_labels.shape, it returns (40,1). When you just turn your pandas series into a tensor, however, it will return a tensor of shape (M) (where M is the length of the series). If you simply run:

import random
labels = list(df['Scale_num']) 
index_aka_labels = torch.tensor(labels)[:,None] #create another dimension
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

the error should disappear.

One more thing, you are not converting your labels into indices as you did in the top example. To do that, you can run:

import random
labels = list(df['Scale_num']) 
index_aka_labels = torch.tensor(labels)[:,None] #create another dimension
index_aka_labels = index_aka_labels + 2 # labels are [-2,-1,0,1,2] and convert them to [0,1,2,3,4]
index_aka_labels[index_aka_labels==11] = 5 #convert label 9 to index 5
torch.zeros(len(df), 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

Upvotes: 1

Related Questions