zidsal
zidsal

Reputation: 587

python using a iterator more then once

I have a iterator that basically returns some modify values from a iterator created with sql commands from apsw

class DatabaseIterator():
    """
    turns the select statement iterator into a deserialized iterator
    I.E. jsonBlob will be a tweet Class
    """
    def __init__(self, dbResult):
        self.dbResult = copy.copy(dbResult) #copying the iterator from apsw

    def __iter__(self):
        return self

    def __next__(self):
        try:
            r = self.dbResult.__next__()
            return {"id" : r[0], "metadata" : pickle.loads(r[1])}
        except StopIteration:
            raise StopIteration

I have a class that will perform some actions on this data using the multiprocessing library. To make this work I loop through the iterator and I count how much data is in the iterator:

def __countItemsInIterator(self, iterator):
    """
    needed as we do not know length of the iterator until we iterate through it
    and may be too big to store as a list in memory
    """
    counter = 0

    for i in iterator:
        counter += 1

    return counter

I then loop through the iterator adding data from the iterator to a list. Once we get enough data I assign it to a process to do my work:

        for tweet in spaceTimeTweetCollection:
            counter += 1
            dataset.append(tweet)


            if sizeOfDataset % counter == 0 and counter >= section or counter >= sizeOfDataset:
                self.start_worker(dataset)
                dataset = []

I then clear the list and start again adding more data. I do this because I cannot turn the iterator into a list as the list would be too big to fit in memory.

to copy the iterator to make this work (we can only use a iterator once) I made a temp version of the iterator to count the size:

temp = copy.deepcopy(spaceTimeTweetCollection)

this worked fun with unit tests however when I hook it up to the database it fails. This is because apsw does not like me copying the iterator:

  File "C:\Python33\lib\copy.py", line 97, in copy
    return _reconstruct(x, rv, 0)
  File "C:\Python33\lib\copy.py", line 287, in _reconstruct
    y = callable(*args)
  File "C:\Python33\lib\copyreg.py", line 88, in __newobj__
    return cls.__new__(cls, *args)
TypeError: object.__new__(apsw.Cursor) is not safe, use apsw.Cursor.__new__()

does anyone know how I can solve this?

Upvotes: 0

Views: 54

Answers (0)

Related Questions