Reputation: 1824
I'm new to PyTorch. I want to keep track of the distance in parameter-space my model travels through its optimization. This is the code I'm using.
class ParameterDiffer(object):
def __init__(self, network):
network_params = []
for p in network.parameters():
network_params.append(p.data.numpy())
self.network_params = network_params
def get_difference(self, network):
total_diff = 0.0
for i, p in enumerate(network.parameters()):
p_np = p.data.numpy()
diff = self.network_params[i] - p_np
# print(diff)
scalar_diff = np.sum(diff ** 2)
total_diff += scalar_diff
return total_diff
Will this work? I keep track of total_diff through time, and am logging it, but it seems to be zero ALWAYS. Even though the model's performance is improving, which confuses me greatly.
Upvotes: 0
Views: 950
Reputation: 28457
This is because the way PyTorch treat conversion between numpy array and torch Tensor
. If the underlying data type between numpy array and torch Tensor are the same, they will share the memory. Change the value of one will also change the value of the other. I will show a concrete example here,
x = Variable(torch.rand(2, 2))
y = x.data.numpy()
x
Out[39]:
Variable containing:
0.8442 0.9968
0.7366 0.4701
[torch.FloatTensor of size 2x2]
y
Out[40]:
array([[ 0.84422851, 0.996831 ],
[ 0.73656738, 0.47010136]], dtype=float32)
Then if you change x in-place and see the value in x and y, you will find they are still the same.
x += 2
x
Out[42]:
Variable containing:
2.8442 2.9968
2.7366 2.4701
[torch.FloatTensor of size 2x2]
y
Out[43]:
array([[ 2.84422851, 2.99683094],
[ 2.7365675 , 2.47010136]], dtype=float32)
So during your model update, the parameter in your model and in the class ParameterDiffer
will always be the same. That is why you are seeing zeros.
If the numpy array and torch Tensor's underlying data type are not compatible, it will force a copy of original data in torch Tensor, which will make the numpy array and torch Tensor have separate memory.
A simple way is just to convert numpy array to type np.float64
. Instead of
network_params.append(p.data.numpy())
You can use
network_params.append(p.data.numpy().astype(np.float64))
Upvotes: 2