user876901
user876901

Reputation: 51

Read data from numpy array into a pytorch tensor without creating a new tensor

Let's say I have a numpy array arr = np.array([1, 2, 3]) and a pytorch tensor tnsr = torch.zeros(3,)

Is there a way to read the data contained in arr to the tensor tnsr, which already exists rather than simply creating a new tensor like tnsr1 = torch.tensor(arr).

This is a simplified example of the problem, since I am using a dataset that contains nearly 17 million entries.

EDIT: I know I can manually loop through each entry in the array. With 17 million entries, that would take quite a while I believe...

Upvotes: 5

Views: 1962

Answers (1)

Lukas S
Lukas S

Reputation: 3603

You can do that using torch.from_numpy(arr). Here is an example that shows that it's not being copied.

import numpy as np
import torch

arr = np.random.randint(0,high=10**6,size=(10**4,10**4))
%timeit arr.copy()

tells me that it took 492 ms ± 6.54 ms to copy the array of random integers. On the other hand

%timeit torch.from_numpy(arr)

tells me that it took 1.14 µs ± 131 ns to turn it into a tensor. So there is no way that the 100 mio integers could have been copied. Pytorch is still using the same data.

Finally your version i.e.

%timeit torch.tensor(arr)

gives 201 ms ± 4.08 ms. Which is quite surprising to me. Since it should not be faster than numpy's copy in copying. But when it's not copying what takes it 1/5 or a second? Maybe it's doing a shallow copy. Maybe somebody else can tell us what's going on exactly.

Upvotes: 3

Related Questions