xbb
xbb

Reputation: 2163

How to preserve file write order when using threading in python

I have some python code to read a file and push data to a list. Then put this list to queue, use threading to process the list, say 20 items a time. After processing, I save the result into a new file. What was put in the new file was actually different order than the original file. For example, I have in input,

    1    a
    2    b
    3    c
    4    a
    5    d

But the output looks like:

    2    aa
    1    ba
    4    aa
    5    da
    3    ca

Is there any way to preserve the original order? Here is my code:


    import threading,Queue,time,sys
    class eSS(threading.Thread):
        def __init__(self,queue):
            threading.Thread.__init__(self)
            self.queue = queue
            self.lock = threading.Lock()
        def ess(self,email,code,suggested,comment,reason,dlx_score):
            #do something
        def run(self):
            while True:
                info = self.queue.get()
                infolist = info.split('\t')
                email = infolist[1]
                code = infolist[2]
                suggested = infolist[3]
                comment = infolist[4]
                reason = infolist[5]
                dlx_score = (0 if infolist[6] == 'NULL' else int(infolist[6]))
                g.write(info + '\t' + self.ess(email,code,suggested,comment,reason,dlx_score) +'\r\n')
                self.queue.task_done()

    if __name__ == "__main__":
        queue = Queue.Queue()
        filename = sys.argv[1]
        #Define number of threads
        threads = 20
        f = open(filename,'r')
        g = open(filename+'.eSS','w')
        lines = f.read().splitlines()
        f.close()
        start = time.time()
        for i in range(threads):
            t = eSS(queue)
            t.setDaemon(True)
            t.start()
        for line in lines:
            queue.put(line)     
        queue.join()
        print time.time()-start
        g.close()

Upvotes: 0

Views: 3180

Answers (1)

Doug Haney
Doug Haney

Reputation: 54

Three thoughts come to mind. Common to all is to include an index with the packet that is queued for processing.

  • One thought then is to use the controller/workers/output framework in which the output thread de-queues the worker-processed data, assembles, and outputs it.
  • The second thought is to employ a memory-mapped file for output, and use the index to calculate the offset to write into the file (assumes fixed-length writes probably).
  • The third is to use the index to put processed data in a new list, and when the list is completed write the items out at the end rather than on the fly.

Upvotes: 3

Related Questions