Reputation: 11
I have ~ 15000 3-D tensors (each one with size 2x300x20) that i need to save to disk. I've checked 2 options: 1. One big Tensor of size 15000x2x300x20 2. table of the 15000 tensors.
I prefer to save it in one big tensor (via torch.save()) but unfortunately the new file is much much bigger in the first option. why is that ? is there an efficient way to save tensor to disk ? (for example, for 160 tensors (instead of 15000) the file is 1.3 GB relative to 10 MG in the second option).
Upvotes: 0
Views: 632
Reputation: 16121
The 2nd option (table of tensors) has an overhead since it includes N (= 15,000) times the header of each 2x300x20 tensor (see below). But this overhead is here negligible w.r.t. the total amount of data. So both options should be roughly equivalent in terms of space.
That being said please note that the underlying storage is part of the archive. That means that if for some reason, the storage is larger than the current tensor size, the archive will be large, e.g.:
x = torch.Tensor(100000)
x[1] = 1234
x:resize(1)
torch.save("x.t7", x)
y = torch.Tensor(1)
y[1] = 1234
torch.save("y.t7", y)
Here x.t7
is around 782KB vs. 119B for y.t7
, because it refers to an underlying storage of 100,000 elements.
In your first option, you should double check that you are not in this case.
--
e.g. serializing a dummy Torch tensor in ASCII mode:
$ th -e "torch.save('foo.t7', torch.Tensor{1234}, 'ascii')"
$ xxd -g1 foo.t7
00000000: 34 0a 31 0a 33 0a 56 20 31 0a 31 38 0a 74 6f 72 4.1.3.V 1.18.tor
00000010: 63 68 2e 44 6f 75 62 6c 65 54 65 6e 73 6f 72 0a ch.DoubleTensor.
00000020: 31 0a 31 0a 31 0a 31 0a 34 0a 32 0a 33 0a 56 20 1.1.1.1.4.2.3.V
00000030: 31 0a 31 39 0a 74 6f 72 63 68 2e 44 6f 75 62 6c 1.19.torch.Doubl
00000040: 65 53 74 6f 72 61 67 65 0a 31 0a 31 32 33 34 0a eStorage.1.1234.
As you can see the archive includes: a first integer (4 here) that denotes the type of the object, and for a Torch class other metadata like its version (V 1
here), etc, and then the final value (1234
here).
Upvotes: 2