PyTorch doesn't seem to be optimizing correctly

Question

I have posted this question on Data Science StackExchange site since StackOverflow does not support LaTeX. Linking it here because this site is probably more appropriate.

The question with correctly rendered LaTeX is here: https://datascience.stackexchange.com/questions/48062/pytorch-does-not-seem-to-be-optimizing-correctly

The idea is that I am considering sums of sine waves with different phases. The waves are sampled with some sample rate s in the interval [0, 2pi]. I need to select phases in such a way, that the sum of the waves at any sample point is minimized.

Below is the Python code. Optimization does not seem to be computed correctly.

import numpy as np
import torch

def phaseOptimize(n, s = 48000, nsteps = 1000):
    learning_rate = 1e-3

    theta = torch.zeros([n, 1], requires_grad=True)
    l = torch.linspace(0, 2 * np.pi, s)
    t = torch.stack([l] * n)
    T = t + theta

    for jj in range(nsteps):
        loss = T.sin().sum(0).pow(2).sum() / s
        loss.backward()
        theta.data -= learning_rate * theta.grad.data

    print('Optimal theta: 

', theta.data)
    print('

Maximum value:', T.sin().sum(0).abs().max().item())

Below is a sample output.

phaseOptimize(5, nsteps=100)


Optimal theta: 

 tensor([[1.2812e-07],
        [1.2812e-07],
        [1.2812e-07],
        [1.2812e-07],
        [1.2812e-07]], requires_grad=True)


Maximum value: 5.0

I am assuming this has something to do with broadcasting in

T = t + theta

and/or the way I am computing the loss function.

One way to verify that optimization is incorrect, is to simply evaluate the loss function at random values for the array $ heta_1, \dots, heta_n$, say uniformly distributed in $[0, 2\pi]$. The maximum value in this case is almost always much lower than the maximum value reported by phaseOptimize(). Much easier in fact is to consider the case with $n = 2$, and simply evaluate at $ heta_1 = 0$ and $ heta_2 = \pi$. In that case we get:

phaseOptimize(2, nsteps=100)

Optimal theta: 

 tensor([[2.8599e-08],
        [2.8599e-08]])


Maximum value: 2.0

On the other hand,

theta = torch.FloatTensor([[0], [np.pi]])
l = torch.linspace(0, 2 * np.pi, 48000)
t = torch.stack([l] * 2)
T = t + theta

T.sin().sum(0).abs().max().item()

produces

3.2782554626464844e-07

Sergii Dymchenko · Accepted Answer

You have to move computing T inside the loop, or it will always have the same constant value, thus constant loss.

Another thing is to initialize theta to different values at indices, otherwise because of the symmetric nature of the problem the gradient is the same for every index.

Another thing is that you need to zero gradient, because backward just accumulates them.

This seems to work:

def phaseOptimize(n, s = 48000, nsteps = 1000):
    learning_rate = 1e-1

    theta = torch.zeros([n, 1], requires_grad=True)
    theta.data[0][0] = 1
    l = torch.linspace(0, 2 * np.pi, s)
    t = torch.stack([l] * n)

    for jj in range(nsteps):
        T = t + theta
        loss = T.sin().sum(0).pow(2).sum() / s
        loss.backward()
        theta.data -= learning_rate * theta.grad.data
        theta.grad.zero_()

PyTorch doesn't seem to be optimizing correctly

Answers (2)

Related Questions

PyTorch doesn&#39;t seem to be optimizing correctly

Answers (2)

Related Questions

PyTorch doesn't seem to be optimizing correctly