Can I set a max file size using the filterwriter in Python?

Question

I got a rather simple question. I have a very large list defined in Python and if I output it to 1 text file the file size will get up to 200mb big. Which I can not open easily.

I was wondering is there any option available within Python which can set the maximum size of a specific write file and create a new file if the size is exceeded?

To summarize:

Current situation: 1 file (200mb)
Desired situation: 8 files (25mb each)

Code so far:

file = open("output_users.txt", "w")
file.write("Total number of users: " + str(len(user_id)))
file.write(str(user_id))
file.close()

rmunn · Accepted Answer

There isn't a built-in way to do that in open(). What I would suggest is that you break up your data into several chunks, then open a different file per chunk. E.g., say you have just over ten thousand items (I use integers here for simplicity, but they could be user records or whatever you're working with) to process. You could split them into ten chunks like so, using the itertools module's groupby function to make your job a bit easier:

import itertools
original_data = range(10003)  # Note how this is *not* divisible by 10
num_chunks = 10
length_of_one_chunk = len(original_data) // num_chunks
chunked_data = []
def keyfunc(t):
    # Given a tuple of (index, data_item), return the index
    # divided by N where N is the length of one chunk. This
    # will produce the value 0 for the first N items, then 1
    # for the next N items, and so on, making this very
    # suitable for passing into itertools.groupby.
    # Note the // operator, which means integer division
    return (t[0] // length_of_one_chunk)

for n, chunk in itertools.groupby(enumerate(original_data), keyfunc):
    chunked_data.append(list(chunk))

This will produce a chunked_data list with a length of 11; each of its elements is a list of data items (in this case, they're just integers). The first ten items of chunked_data will all have N items, where N is the value of length_of_one_chunk (in this case, precisely 1000). The last element of chunked_data will be a list of the 3 leftover items that didn't fit evenly across the other lists; you could write them to a separate file, or just append them to the end of the last file.

If you change the range(10003) to range(10027), then N would be 1002 and the last element would have 7 leftover items. And so on.

Then you just run chunked_data through a for loop, and for each list inside it, process the data normally, opening a new file each time. And you'll have your 10 files (or 8, or whatever you set num_chunks to).

Can I set a max file size using the filterwriter in Python?

Answers (1)

Related Questions