LarsVegas
LarsVegas

Reputation: 6812

Python: How to speed up creating of objects?

I'm creating objects derived from a rather large txt file. My code is working properly but takes a long time to run. This is because the elements I'm looking for in the first place are not ordered and not (necessarily) unique. For example I am looking for a digit-code that might be used twice in the file but could be in the first and the last row. My idea was to check how often a certain code is used...

counter=collections.Counter([l[3] for l in self.body])

...and then loop through the counter. Advance: if a code is only used once you don't have to iterate over the whole file. However You are stuck with a lot of iterations which makes the process really slow.

So my question really is: how can I improve my code? Another idea of course is to oder the data first. But that could take quite long as well.

The crucial part is this method:

def get_pc(self):
    counter=collections.Counter([l[3] for l in self.body])
    # This returns something like this {'187':'2', '199':'1',...}

    pcode = []

    #loop through entries of counter
    for k,v in counter.iteritems():
        i = 0
        #find post code in body
        for l in self.body:
            if i == v:
                break
            # find fist appearence of key 
            if l[3] == k:
                #first encounter...
                if i == 0:
                    #...so create object
                    self.pc = CodeCana(k,l[2])
                    pcode.append(self.pc)
                i += 1
                # make attributes
                self.pc.attr((l[0],l[1]),l[4])
            if v <= 1:
                break
    return pcode

I hope the code explains the problem sufficiently. If not, let me know and I will expand the provided information.

Upvotes: 0

Views: 1061

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121594

You are looping over body way too many times. Collapse this into one loop, and track the CodeCana items in a dictionary instead:

def get_pc(self):
    pcs = dict()    
    pcode = []

    for l in self.body:
        pc = pcs.get(l[3])
        if pc is None:
            pc = pcs[l[3]] = CodeCana(l[3], l[2])
            pcode.append(pc)
         pc.attr((l[0],l[1]),l[4])

    return pcode

Counting all items first then trying to limit looping over body by that many times while still looping over all the different types of items defeats the purpose somewhat...

You may want to consider giving the various indices in l names. You can use tuple unpacking:

for foo, bar, baz, egg, ham in self.body:
    pc = pcs.get(egg)
    if pc is None:
        pc = pcs[egg] = CodeCana(egg, baz)
        pcode.append(pc)
     pc.attr((foo, bar), ham)

but building body out of a namedtuple-based class would help in code documentation and debugging even more.

Upvotes: 2

Related Questions