Reputation: 1695
I have millions of domains which I will send WHOIS query and record WHOIS response on some .txt file.
I would like to set maximum capacity for a single .txt
output file. For example, let's say I started recording responses on out0.txt
. I want to switch to out1.txt
if out0.txt
is >= 100mb. Same thing goes for out1.txt
, if out1.txt
>=100mb then start writing to out2.txt
and so on.
I know that I can do if
checks after each insertion, but I want my code to be fast: i.e. I thought if checks at each domain can slow down my code. (It will asynchronously query millions of domains).
I imagined a try-except block could solve my issue here, like this:
folder_name = "out%s.txt"
folder_number = 0
folder_name = folder_name % folder_number
f = open(folder_name, 'w+')
for domain in millions_of_domains:
try:
response_json = send_whois_query(domain)
f.write(response_json)
except FileGreaterThan100MbException:
folder_number += 1
folder_name = folder_name % folder_number
f = open(folder_name, 'w+')
f.write(response_json)
Any suggestions will be appreciated. Thank you for your time.
Upvotes: 0
Views: 35
Reputation: 1121406
You can create a wrapper object that tracks how much data has been written, and opens a new file if you reached a limit:
class MaxSizeFileWriter(object):
def __init__(self, filenamepattern, maxdata=2**20, # default 1Mb
start=0, mode='w', *args, **kwargs):
self._pattern = filenamepattern
self._counter = start
self._mode = mode
self._args, self._kwargs = args, kwargs
self._max = maxdata
self._openfile = None
self._written = 0
def _open(self):
if self._openfile is not None:
filename = self._pattern.format(self._counter)
self._counter += 1
self._openfile = open(filename, mode=self._mode, *self._args, **self._kwargs)
def _close(self):
if self._openfile is not None:
self._openfile.close()
def __enter__(self):
return self
def __exit__(self, *args, **kwargs):
if self._openfile is not None:
self._openfile.close()
def write(self, data):
if self._written + len(data) > self._max:
# current file too full to fit data too, close it
# This will trigger a new file to be opened.
self._close()
self._open() # noop if already open
self._openfile.write(data)
self._written += len(data)
The above is a context manager, and can be used just like a regular file. Pass in a filename with a {}
placeholder for the number to be inserted into:
folder_name = "out{}.txt"
with MaxSizeFileWriter(folder_name, maxdata=100 * 2**10) as f:
for domain in millions_of_domains:
response_json = send_whois_query(domain)
f.write(response_json)
Upvotes: 1