Reputation: 51
Let's say I have a numpy array arr = np.array([1, 2, 3])
and a pytorch tensor tnsr = torch.zeros(3,)
Is there a way to read the data contained in arr
to the tensor tnsr
, which already exists rather than simply creating a new tensor like tnsr1 = torch.tensor(arr)
.
This is a simplified example of the problem, since I am using a dataset that contains nearly 17 million entries.
EDIT: I know I can manually loop through each entry in the array. With 17 million entries, that would take quite a while I believe...
Upvotes: 5
Views: 1962
Reputation: 3603
You can do that using torch.from_numpy(arr)
. Here is an example that shows that it's not being copied.
import numpy as np
import torch
arr = np.random.randint(0,high=10**6,size=(10**4,10**4))
%timeit arr.copy()
tells me that it took 492 ms ± 6.54 ms
to copy the array of random integers.
On the other hand
%timeit torch.from_numpy(arr)
tells me that it took 1.14 µs ± 131 ns
to turn it into a tensor. So there is no way that the 100 mio integers could have been copied. Pytorch is still using the same data.
Finally your version i.e.
%timeit torch.tensor(arr)
gives 201 ms ± 4.08 ms. Which is quite surprising to me. Since it should not be faster than numpy's copy in copying. But when it's not copying what takes it 1/5 or a second? Maybe it's doing a shallow copy. Maybe somebody else can tell us what's going on exactly.
Upvotes: 3