Reputation: 587
I have a iterator that basically returns some modify values from a iterator created with sql commands from apsw
class DatabaseIterator():
"""
turns the select statement iterator into a deserialized iterator
I.E. jsonBlob will be a tweet Class
"""
def __init__(self, dbResult):
self.dbResult = copy.copy(dbResult) #copying the iterator from apsw
def __iter__(self):
return self
def __next__(self):
try:
r = self.dbResult.__next__()
return {"id" : r[0], "metadata" : pickle.loads(r[1])}
except StopIteration:
raise StopIteration
I have a class that will perform some actions on this data using the multiprocessing library. To make this work I loop through the iterator and I count how much data is in the iterator:
def __countItemsInIterator(self, iterator):
"""
needed as we do not know length of the iterator until we iterate through it
and may be too big to store as a list in memory
"""
counter = 0
for i in iterator:
counter += 1
return counter
I then loop through the iterator adding data from the iterator to a list. Once we get enough data I assign it to a process to do my work:
for tweet in spaceTimeTweetCollection:
counter += 1
dataset.append(tweet)
if sizeOfDataset % counter == 0 and counter >= section or counter >= sizeOfDataset:
self.start_worker(dataset)
dataset = []
I then clear the list and start again adding more data. I do this because I cannot turn the iterator into a list as the list would be too big to fit in memory.
to copy the iterator to make this work (we can only use a iterator once) I made a temp version of the iterator to count the size:
temp = copy.deepcopy(spaceTimeTweetCollection)
this worked fun with unit tests however when I hook it up to the database it fails. This is because apsw does not like me copying the iterator:
File "C:\Python33\lib\copy.py", line 97, in copy
return _reconstruct(x, rv, 0)
File "C:\Python33\lib\copy.py", line 287, in _reconstruct
y = callable(*args)
File "C:\Python33\lib\copyreg.py", line 88, in __newobj__
return cls.__new__(cls, *args)
TypeError: object.__new__(apsw.Cursor) is not safe, use apsw.Cursor.__new__()
does anyone know how I can solve this?
Upvotes: 0
Views: 54