Feng Shi
Feng Shi

Reputation: 19

pytorch loss function for regression model with a vector of values

I'm training a CNN architecture to solve a regression problem using PyTorch where my output is a tensor of 25 values. The input/target tensor could be either all zeros or a gaussian distribution with a sigma value of 2. An example of a 4-sample batch is as this one:

[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,  0., 0.],
 [0., 0., 0.,  0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

My question is how to design a loss function for the model effectively learn the regression output with 25 values.

I have tried 2 types of loss, torch.nn.MSELoss() and torch.nn.MSELoss()-torch.nn.CosineSimilarity(). They sort of work. However, sometimes the network has difficulty converging, especially when there are a lot of samples with all "zeros", which leads the network to output a vector with all 25 small values.

My question is, is there any other loss which we could try?

Upvotes: 1

Views: 1883

Answers (1)

Zoom
Zoom

Reputation: 464

Your values do not seem widely different in scale so an MSELoss seems like it would work fine. Your model could be collapsing because of the many zeros in your target.

You can always try torch.nn.L1Loss() (but I do not expect it to be much better than torch.nn.MSELoss())

I suggest that you instead try to predict the gaussian mean/mu, and later try to re-create the gaussian for each sample if you really need it.

So you have two alternatives if you choose to try this method.

Alt 1

A good alternative is to encode your target to look like a classification target. Your 25 element vectors become a single value where the original target == 1 (possible classes will 0, 1, 2, ..., 24). We can then assign a sample that contains "only zeroes" as our last class "25". So your target:

[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,  0., 0.],
 [0., 0., 0.,  0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

becomes

[4,
10,
20,
25]

If you do this, then you can try the common torch.nn.CrossEntropyLoss().

I do not know what your dataloader looks like but given a single sample in your original format, you can convert it to my proposed format with:

def encode(tensor):
    if tensor.sum() == 0:
        return len(tensor)
    return torch.argmax(tensor)

and back to a gaussian with:

def decode(value):
    n_values = 25
    zero = torch.zeros(n_values)
    if value == n_values:
        return zero
    # Create gaussian around value
    std = 2
    n = torch.arange(n_values) - value
    sig = 2*std**2
    gauss = torch.exp(-n**2 / sig2)
    # Only return 9 values from the gaussian
    start_ix = max(value-6, 0)
    end_ix = min(value+7,n_values)
    zero[start_ix:end_ix] = gauss[start_ix:end_ix]
    return zero

(Note I have not tried them with batches, only samples)

Alt 2

The second option is to change your regression targets (still only the argmax positions (mu)) to a nicer regression value in the range 0-1 and have a separate neuron that outputs a "mask value" (also 0-1). Then your batch of:

[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,  0., 0.],
 [0., 0., 0.,  0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
  [0., 0., 0.,  0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

becomes

# [Mask, mu]
[
[1, 0.1666], # True, 4/24
[1, 0.4166], # True, 10/24
[1, 0.8333], # True, 20/24
[0, 0]       # False, undefined
]

If you are using this setup, then you should be able to use an MSELoss with modification:

def custom_loss(input, target):
    # Assume target and input is of shape [Batch, 2]
    mask = target[...,1]
    mask_loss = torch.nn.functional.mse_loss(input[...,0], target[...,0])
    mu_loss = torch.nn.functional.mse_loss(mask*input[...,1], mask*target[...,1])
    return (mask_loss + mu_loss) / 2

This loss would only look at the 2nd value (mu) if the mask of the target is 1. Otherwise it only tried to optimize for the correct mask.

To encode to this format you would use:

def encode(tensor):
    n_values = 25
    if tensor.sum() == 0:
        return torch.tensor([0,0])
    return torch.argmax(tensor) / (n_values-1)

and to decode:

def decode(tensor):
    n_values = 25

    # Parse values
    mask, value = tensor
    mask = torch.round(mask)
    value = torch.round((n_values-1)*value)

    zero = torch.zeros(n_values)
    if mask == 0:
        return zero
    # Create gaussian around value
    std = 2
    n = torch.arange(n_values) - value
    sig = 2*std**2
    gauss = torch.exp(-n**2 / sig2)
    # Only return 9 values from the gaussian
    start_ix = max(value-6, 0)
    end_ix = min(value+7,n_values)
    zero[start_ix:end_ix] = gauss[start_ix:end_ix]
    return zero

Upvotes: 1

Related Questions