Face alignment in pytorch

Question

I am trying to do face alignment on 300W dataset. I am using ResNet50 and L1 loss for training. My code looks like this.

batch_size = 10
image_size = 128

net = torchvision.models.resnet50(pretrained=True)
num_ftrs = net.fc.in_features
net.fc = nn.Linear(num_ftrs, 136) # 136 because 68 points with 2 dim. so 136= 68*2

def train():
    device = torch.device("cuda:0" if torch.cuda.is_available() else 
           "cpu")

    optimiser = optim.Adam(net.parameters(), lr=0.001, 
            weight_decay=0.0005)

    criterion = L1Loss(reduction='sum')

    for epoch in range(int(0), 200000):
        for batch, data in enumerate(trainloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            optimiser.zero_grad()

            outputs = net(inputs).reshape(-1, 68, 2)

            loss = criterion(outputs, labels)
            loss.backward()
            optimiser.step()
            running_loss += loss.item()


            sys.stdout.write(
            '\rTrain Epoch: {} Batch {} avg_Loss_per_batch: {:.2f} 
            '.format(epoch, batch, running_loss/(batch+1)))
            sys.stdout.flush()

The trainloader is with images and points. The ground-truths are shaped as (batch, 68, 2). We have 68 points on the face on 2 dimensional space.

The papers suggests that the ResNet50 should get an error of 10 (metric: pixel) for a 256*256 image with L1 loss. I am getting error around 500-800 on validation set even after 5000 epoch.

I am training images with 256*256 resolution with ground truth of 68 points such as ((x1,y1),(x2,y2)....(x68,y68)) and I have trained over 5000 epoch with many learning rates. My validation code looks like this,

def validater(load_weights=False):
    device = torch.device("cuda:0" if torch.cuda.is_available() else 
          "cpu")
    net.eval()
    net.to(device)

    with torch.no_grad():
        for batch, data in enumerate(testloader, 0):
            inputs, labels = data
            inputs, labels  = inputs.to(device), labels.to(device)

            outputs = net(inputs).reshape(-1, 68, 2)

            loss = criterion(outputs, labels)

            loss2 = np.linalg.norm(labels.to('cpu') - outputs.to('cpu'))

            sys.stdout.write('\rTest Epoch: {} Batch {} total_L1_Loss: 
                {:.2f} avg_L1_Loss_per_img: {:.2f} total_norm_loss: 
                 {:.2f}'.format(
                0, batch, avg_loss, avg_loss/batch/batch_size, 
                avg_loss2))
            sys.stdout.flush()

    print()

What is wrong with my code ?

PS: I normalise the imgs with the following code

    img = cv2.normalize(img, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)

After 4000 epoch I get outputs like this where yellow dots are ground truth and blue ones are predicted

Face alignment in pytorch

Answers (1)

Related Questions