Giorgio
Giorgio

Reputation: 65

PyTorch Binary classification not learning

I state that I am new on PyTorch. I wrote this simple program for binary classification. I also created the CSV with two columns of random values, with the "ok" column whose value is 1 only if the other two values are included between two values I decided at the same time. Example:

diam_int,diam_est,ok
37.782,125.507,0
41.278,115.15,1
42.248,115.489,1
29.582,113.141,0
37.428,107.247,0
32.947,123.233,0
37.146,121.537,0
38.537,110.032,0
26.553,113.752,0
27.369,121.144,0
41.632,108.178,0
27.655,111.279,0
29.779,109.268,0
43.695,115.649,1
44.587,116.126,0

It seems to me all done correctly, loss actually lowers (it comes back up slightly after many epochs but I don't think it's a problem), but when I try to test my Net after the training, with a sample batch of the trainset, what I got is always a prediction below 0.5 (so always 0 as estimated output) with a completely random trend.

with torch.no_grad():
        pred = net(trainSet[10])
        trueVal = ySet[10]
        for i in range(len(trueVal)):
            print(trueVal[i], pred[i])

Here is my Net class:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self) :
        super().__init__()
        self.fc1 = nn.Linear(2, 32)
        self.fc2 = nn.Linear(32, 64)
        self.fc3 = nn.Linear(64, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return torch.sigmoid(x)

Here is my Main class:

import torch
import torch.optim as optim
import torch.nn.functional as F
import pandas as pd

from net import Net 

df = pd.read_csv("test.csv")
y = torch.Tensor(df["ok"])
ySet = torch.split(y, 32)
df.drop(["ok"], axis=1, inplace=True)
data = F.normalize(torch.Tensor(df.values), dim=1)
trainSet = torch.split(data, 32)

net = Net()
optimizer = optim.Adam(net.parameters(), lr=0.001)
lossFunction = torch.nn.BCELoss()
EPOCHS = 300

for epoch in range(EPOCHS):
    for i, X in enumerate(trainSet):
        optimizer.zero_grad()
        output = net(X)
        target = ySet[i].reshape(-1, 1)
        loss = lossFunction(output, target)
        loss.backward()
        optimizer.step()

    if epoch % 20 == 0:
        print(loss)

What am I doing wrong? Thanks in advance for the help

Upvotes: 2

Views: 1377

Answers (1)

Deadly Pointer
Deadly Pointer

Reputation: 322

Your model is underfit. Increasing the number of epochs to (say) 3000 makes the model predict perfectly on the examples you showed.

However after this many epochs the model may be overfit. A good practice is to use validation data (separate the generated data into train and validation sets), and check the validation loss in each epoch. When the validation loss starts increasing you start overfitting and stop the training.

Upvotes: 2

Related Questions