Understanding the `model.fit` function in keras and imbalanced datasets

Question

As an exercise, I'm trying to translate a model written in Keras (https://github.com/CVxTz/ECG_Heartbeat_Classification/blob/master/code/baseline_mitbih.py) into Pytorch code. I realize in Keras much of the training part is abstracted into the function model.fit() function, while in Pytorch one has to be explicit. The dataset that is used to train the aforementioned model can be found on kaggle (/kaggle/input/heartbeat/mitbih_test.csv).By the looks of it the data is imbalanced towards one of the five classes.

Converting the model to Pytorch is rather simple, but writing a good training routine has proved to be challenging. A standard method to circumvent dataset imbalance is using weighted cross entropy loss. I computed the class_weights using sklearn's compute_class_weight_function which was then fed to the CrossEntropyLoss function. However, with this approach the loss doesn't go down as one would expect. My code for the training routine is given below:

class Model(nn.Module):
    def __init__(self, nclass):
        super(ConvModel, self).__init__()
        
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=16, kernel_size=5, padding=0, dtype=float)
        self.conv2 = nn.Conv1d(in_channels=16, out_channels=16, kernel_size=5, padding=0, dtype=float)
        self.pool1 = nn.MaxPool1d(kernel_size=2)
        self.dropout1 = nn.Dropout(0.1)
        
        self.conv3 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3, padding=0, dtype=float)
        self.conv4 = nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3, padding=0, dtype=float)
        self.pool2 = nn.MaxPool1d(kernel_size=2)
        self.dropout2 = nn.Dropout(0.1)
        
        self.conv5 = nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3, padding=0, dtype=float)
        self.conv6 = nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3, padding=0, dtype=float)
        self.pool3 = nn.MaxPool1d(kernel_size=2)
        self.dropout3 = nn.Dropout(0.1)
        
        self.conv7 = nn.Conv1d(in_channels=32, out_channels=256, kernel_size=3, padding=0, dtype=float)
        self.conv8 = nn.Conv1d(in_channels=256, out_channels=256, kernel_size=3, padding=0, dtype=float)
        self.global_max_pool = nn.AdaptiveMaxPool1d(1)
        self.dropout4 = nn.Dropout(0.2)
        
        self.fc1 = nn.Linear(256, 64, dtype=float)
        self.fc2 = nn.Linear(64, 64, dtype=float)
        self.fc3 = nn.Linear(64, nclass, dtype=float)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.pool1(x)
        x = self.dropout1(x)
        
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.pool2(x)
        x = self.dropout2(x)
        
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv6(x))
        x = self.pool3(x)
        x = self.dropout3(x)
        
        x = F.relu(self.conv7(x))
        x = F.relu(self.conv8(x))
        
        x = self.global_max_pool(x)
        x = torch.flatten(x, 1)  # Flatten the output for fully connected layers
        x = self.dropout4(x)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        
        return torch.softmax(x, dim=1)

# Hyperparameters
input_dim = 187  # Original time series length
num_classes = 5  # Number of output classes

# Instantiate the model
model = Model(num_classes)
# print(model)
model.to(device)
# Define loss function and optimizer
class_weights=class_weight.compute_class_weight('balanced',classes=np.unique(Y),y=Y.numpy())
class_weights=torch.tensor(class_weights,dtype=torch.float64).to(device)
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
n_epochs = 1000
for epoch in range(n_epochs):
    model.train()
    last_loss = 0
    running_loss = 0
    
    for i, (X_batch, Y_batch) in enumerate(train_loader):
        X_batch = X_batch.to(device)
        Y_batch = Y_batch.to(device)
        output = model(X_batch)
        loss = criterion(output, Y_batch)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
     
        running_loss += loss.item()
        if i % 200 == 199:
            last_loss = running_loss / 200 # loss per batch
            print('  batch {} loss: {}'.format(i + 1, last_loss))
            running_loss = 0.

On the Keras side, simply calling

model.fit(X, Y, epochs=1000, verbose=2, validation_split=0.1)

does the trick, where X and Y are the training data and targets respectively. There is no additional input provided about the distribution of the data. How does model.fit() differ from my training code? Does it do something different when it comes to sampling from the dataset?

Note that the Keras implementation does not require using class_weights. On the other hand, the Pytorch implementation does not work either with or without class_weights.

Understanding the `model.fit` function in keras and imbalanced datasets

Answers (0)

Related Questions