Calculating Distance in Parameter Space for PyTorch Network

Question

I'm new to PyTorch. I want to keep track of the distance in parameter-space my model travels through its optimization. This is the code I'm using.

class ParameterDiffer(object):
    def __init__(self, network):
        network_params = []
        for p in network.parameters():
            network_params.append(p.data.numpy())
        self.network_params = network_params

    def get_difference(self, network):
        total_diff = 0.0
        for i, p in enumerate(network.parameters()):
            p_np = p.data.numpy()
            diff = self.network_params[i] - p_np
            # print(diff)
            scalar_diff = np.sum(diff ** 2)
            total_diff += scalar_diff
        return total_diff

Will this work? I keep track of total_diff through time, and am logging it, but it seems to be zero ALWAYS. Even though the model's performance is improving, which confuses me greatly.

jdhao · Accepted Answer

The cause

This is because the way PyTorch treat conversion between numpy array and torch Tensor. If the underlying data type between numpy array and torch Tensor are the same, they will share the memory. Change the value of one will also change the value of the other. I will show a concrete example here,

x = Variable(torch.rand(2, 2))
y = x.data.numpy()
x
Out[39]: 
Variable containing:
 0.8442  0.9968
 0.7366  0.4701
[torch.FloatTensor of size 2x2]
y
Out[40]: 
array([[ 0.84422851,  0.996831  ],
       [ 0.73656738,  0.47010136]], dtype=float32)

Then if you change x in-place and see the value in x and y, you will find they are still the same.

x += 2
x
Out[42]: 
Variable containing:
 2.8442  2.9968
 2.7366  2.4701
[torch.FloatTensor of size 2x2]
y
Out[43]: 
array([[ 2.84422851,  2.99683094],
       [ 2.7365675 ,  2.47010136]], dtype=float32)

So during your model update, the parameter in your model and in the class ParameterDiffer will always be the same. That is why you are seeing zeros.

How to work around this?

If the numpy array and torch Tensor's underlying data type are not compatible, it will force a copy of original data in torch Tensor, which will make the numpy array and torch Tensor have separate memory.

A simple way is just to convert numpy array to type np.float64. Instead of

network_params.append(p.data.numpy())

You can use

network_params.append(p.data.numpy().astype(np.float64))

Calculating Distance in Parameter Space for PyTorch Network

Answers (1)

The cause

How to work around this?

Important references

Related Questions