Sara De Luca
Sara De Luca

Reputation: 35

Normalization of the dataset, Error: all elements of input should be between 0 and 1

I have a problem with data normalization in PyTorch when I try to execute the training. First thing you need to know is that the dataset is composed of 3024 signal windows (so 1 channel), each one with a length of 5000 samples, so the dimension of the CSV file is 5000x3024. Each signal has 1 label that needs to be predicted. Here is the code for how I load and normalize the data:

class CSVDataset(Dataset):
    # load the dataset
    def __init__(self, path, normalize = False):
        # load the csv file as a dataframe
        df = read_csv(path)
        df = df.transpose()
        # store the inputs and outputs
        self.X = df.values[:, :-1]
        self.y = df.values[:, -1]
        print("Dataset length: ", self.X.shape[0])
        # ensure input data is floats
        self.X = self.X.astype(np.float)
        self.y = self.y.astype(np.float)
        
        if normalize:
            self.X = self.X.reshape(self.X.shape[1], self.X.shape[0])
            min_X = np.min(self.X,0)  # returns an array of means for each signal window
            max_X = np.max(self.X,0)
            self.X = (self.X - min_X)/(max_X-min_X)
            min_y = np.min(self.y) 
            max_y = np.max(self.y)
            self.y = (self.y - min_y)/(max_y-min_y)
        
        # reshape input data
        self.X = self.X.reshape(self.X.shape[0], 1, self.X.shape[1])
        self.y = self.y.reshape(self.y.shape[0], 1)
        # label encode target and ensure the values are floats
        self.y = LabelEncoder().fit_transform(self.y)
        self.y = self.y.astype(np.float)

# prepare the dataset
def prepare_data(path):
    # load the dataset
    dataset = CSVDataset(path, normalize = True)
    # calculate split
    train, test = dataset.get_splits()
    # prepare data loaders
    train_dl = DataLoader(train, batch_size=32, shuffle=True)
    test_dl = DataLoader(test, batch_size=1024, shuffle=False)
    return train_dl, test_dl
    

While the train method is:

def train_model(train_dl, model):
    # define the optimization
    criterion = BCELoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
    model = model.float()
    # enumerate epochs
    for epoch in range(100):
        # enumerate mini batches
        for i, (inputs, targets) in enumerate(iter(train_dl)):
            targets = torch.reshape(targets, (32, 1))
            # clear the gradients
            optimizer.zero_grad()
            # compute the model output
            yhat = model(inputs.float())
            # calculate loss
            loss = criterion(yhat, targets.float())
            # credit assignment
            loss.backward()
            # update model weights
            optimizer.step()

The error that I get is in the line loss = criterion(yhat, targets.float()) and it says:

RuntimeError: all elements of input should be between 0 and 1

I have tried inspecting the X in the variable explorer and it doesn't seem that there are any values that are not between 0 and 1. I don't know what I could have done wrong in normalization. Can you help me?

Upvotes: 1

Views: 1055

Answers (1)

Ivan
Ivan

Reputation: 40728

Builtin loss functions refer to input and target to designate the prediction and label instances respectively. The error message should be understood as "input of the criterion" i.e. yhat, and not as "input of the model".

It seems yhat does not belong in [0, 1], while BCELoss expects a probability, not a logit. You can either

  • add a sigmoid layer as the last layer of your model, or

  • use nn.BCEWithLogitsLoss instead, which combines a sigmoid and the bce loss.

Upvotes: 1

Related Questions