Reputation: 602
I know of this nice code that can generate files specific sizes and write them down .
def file_generator(location, size):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
target.write(os.urandom(size))
return filename
How ever there is one small issue, it can't generate files bigger than system RAM it will fail with MemoryError, any idea how to write out the file in stream or somehow go around this issue ?
Upvotes: 0
Views: 1439
Reputation: 41147
When dealing with this kind of problem, the solution is breaking the data in chunks, choosing a favorable chunk size that:
In the example below, the wanted file size is split in (32 MiB) chunks (resulting a number (>= 0) of complete chunks, and possibly an incomplete chunk at the end).
code.py:
import sys
import os
import uuid
DEFAULT_CHUNK_SIZE = 33554432 # 32 MiB
def file_generator(location, size):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
target.write(os.urandom(size))
return filename
def file_generator_chunked(location, size, chunk_size=DEFAULT_CHUNK_SIZE):
file_name = str(uuid.uuid4())
chunks = size // chunk_size
last_chunk_size = size % chunk_size
with open("{0}{1}".format(location, file_name), "wb") as target:
for _ in range(chunks):
target.write(os.urandom(chunk_size))
if last_chunk_size:
target.write(os.urandom(last_chunk_size))
return file_name
def main():
file_name = file_generator_chunked("/tmp", 100000000)
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Upvotes: 2
Reputation: 59575
os.urandom
returns a string of the specified size. That string first needs to fit in memory. If that were a generator, things would work in a more memory efficient way.
It has nothing to do with system memory, however. It does not depend on the amount of physical RAM installed on your machine. It's limited by virtual memory, which is ~ 8TB for 64 bit programs on 64 bit Windows. However, that may involve swapping to disk, which becomes slow.
Therefore, potential solutions are:
In contrast to @quamrana's answer, I would not change the method signature. The caller could still choose 1 block à 8 GB, which has the same effect as before.
The following takes that burden from the caller:
def file_generator(location, size):
filename = str(uuid.uuid4())
chunksize = 10*1024*1024
with open('{0}{1}'.format(location, filename), 'wb') as target:
while size>chunksize:
target.write(os.urandom(chunksize))
size -= chunksize
target.write(os.urandom(size))
return filename
Upvotes: 3
Reputation: 39404
Write the file in blocks:
def large_file_generator(location, block_size, number_of_blocks):
filename = str(uuid.uuid4())
with open('{0}{1}'.format(location, filename), 'wb') as target:
for _ in range(number_of_blocks):
target.write(os.urandom(block_size))
return filename
Upvotes: 0