Reputation: 46353
I have a big numpy.int32
array, that can take 4GB or more. It's in fact a 24-bit integer array (quite common in audio applications), but as numpy.int24
doesn't exist, I used an int32
.
I want to output this array's data as 24-bit, i.e. 3 bytes per number, to a file.
This works (I found this "recipe" somewhere a while ago, but I can't find where anymore):
import numpy as np
x = np.array([[-33772,-2193],[13313,-1314],[20965,-1540],[10706,-5995],[-37719,-5871]], dtype=np.int32)
data = ((x.reshape(x.shape + (1,)) >> np.array([0, 8, 16])) & 255).astype(np.uint8)
print(data.tostring())
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'
but the many reshape
makes it inefficient when x
is a few GB large: it takes a huge amount of unneeded RAM.
Another solution would be to remove every 4th byte:
s = bytes([c for i, c in enumerate(x.tostring()) if i % 4 != 3])
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'
It works but I suspect that if x
takes, say, 4 GB of RAM, this line would eat at least 8 GB of RAM, for both s
and x
(and maybe also x.tostring()
?)
TL;DR: How to efficiently (without using in RAM twice the size of the actual data) write to disk an int32 array, as a 24 bit array, by removing every 4th byte?
Note: it's possible since the integers are in fact 24-bit, i.e. each value is < 2^23-1 in absolute value
Upvotes: 4
Views: 656
Reputation: 208003
I ran your code and got similar times to your 35 sec, but that seems way too slow for 750MB when my SSD can do 2GB/s. I can't imagine why it is so slow. So I decided to use OpenCV's highly optimised SIMD code which reduces an RGBA8888 image to RGB888 by stripping every 4th byte of Alpha/transparency information - which is equivalent to converting 32-bit to 24-bit.
In order not to use too much extra memory, I did it in chunks of 1,000,000 stereo samples (6MB) at a time, and appended to the output file. It runs in under 1 sec and the files compare identical to that created by your code.
#!/usr/bin/env python3
import numpy as np
import cv2
def orig(x):
x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
x2.tofile('orig.dat')
def chunked(x):
BATCHSIZE = 1_000_000
l = len(x)
with open('test.dat', 'w') as file:
for b in range(0,l,BATCHSIZE):
s = min(BATCHSIZE,l-b)
y = x[b:b+s,:].view(np.uint8).reshape(s*2,1,4)
z = cv2.cvtColor(y,cv2.COLOR_BGRA2BGR)
# Append to file
z.tofile(file)
if b+s == l:
break
# Repeatable randomness
np.random.seed(42)
# Create array of stereo samples
NSAMPLES = 125_000_000
x = np.random.randint(10000, size=(NSAMPLES, 2), dtype=np.int32)
# orig(x)
chunked(x)
Upvotes: 1
Reputation: 59731
Assuming x
is C-contiguous and that your platform is little-endian (would need little adjustments otherwise), you can do that like this:
import numpy as np
# Input data
x = np.array([[-33772, -2193], [13313, -1314], [20965, -1540],
[10706, -5995], [-37719, -5871]], dtype=np.int32)
# Make 24-bit uint8 view
x2 = np.ndarray(shape=x.shape + (3,), dtype=np.uint8, buffer=x, offset=0,
strides=x.strides + (1,))
print(x2.tostring())
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95...
np.save('data.npy', x2) # Save to disk
In this example, note that:
x.shape + (3,)
is (5, 2, 3)
.x2
essentially is a view of x
, that is, it uses the same data.x.strides + (1,)
is here (8, 4, 1)
.
Every new row of x
advances 8 bytes with respect to its previous row, and every new column advances 4 bytes. In x2
, I add a 1 to the strides so every item in the new innermost dimension advances 1 byte with respect to the previous one. If the shape of x2
was (5, 2, 4) (that is, using + (4,)
instead of + (3,)
), it would be the same as x
, but since it is (5, 2, 3), the last byte is just "skipped over".You can restore it with:
x2 = np.load('data.npy', mmap_mode='r') # Use mmap to avoid using extra memory
x3 = np.zeros(x2.shape[:-1] + (4,), np.uint8)
x3[..., :3] = x2
del x2 # Release mmap
# Fix negative sign in last byte (could do this in a loop
# or in "batches" if you want to avoid the intermediate
# array from the "&" operation, or with Numba)
x3[..., 3] = np.where(x3[..., 2] & 128, 255, 0)
# Make int32 view
x4 = np.ndarray(x3.shape[:-1], np.int32, buffer=x3, offset=0, strides=x3.strides[:-1])
print(x4)
# [[-33772 -2193]
# [ 13313 -1314]
# [ 20965 -1540]
# [ 10706 -5995]
# [-37719 -5871]]
Upvotes: 2
Reputation: 46353
After some more fiddling, I found this to be working:
import numpy as np
x = np.array([[-33772,-2193],[13313,-1314],[20965,-1540],[10706,-5995],[-37719,-5871]], dtype=np.int32)
x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
print(x2.tostring())
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'
Here is a time+memory benchmark:
import numpy as np, time
t0 = time.time()
x = np.random.randint(10000, size=(125_000_000, 2), dtype=np.int32) # 125M * 2 * 4 bytes ~ 1GB of RAM
print('Random array generated in %.1f sec.' % (time.time() - t0))
time.sleep(5)
# you can check the RAM usage in the task manager in the meantime...
t0 = time.time()
x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
x2.tofile('test')
print('24-bit output file written in %.1f sec.' % (time.time() - t0))
Result:
Random array generated in 4.6 sec.
24-bit output file written in 35.9 sec.
Also, only ~1GB has been used during the whole processing (monitored with Windows Task Manager)
@jdehesa's method gave similar results, i.e. if we use this line instead:
x2 = np.ndarray(shape=x.shape + (3,), dtype=np.uint8, buffer=x, offset=0, strides=x.strides + (1,))
the RAM usage of the process also peaked at 1GB, and the time spent on x2.tofile(...)
is ~37 sec.
Upvotes: 2