John_U262D
John_U262D

Reputation: 31

for loop to a list comprehension

Hey all, I have some code to read certain lines from a file, and wanted to know if it would run faster as a list comprehension or a generator expression/function. And if it does run faster, how would the code look? Still learning Python. Thanks for your help

input = open('C:/.../list.txt', 'r')
output = open('C:/.../output.txt', 'w')

x=0

for line in input:
    x = x+1
    if x > 2 and x < 5:
        output.write(line)

list file has

1
2
3
4
5

output in new file is

3
4

Upvotes: 1

Views: 2950

Answers (6)

Hugh Bothwell
Hugh Bothwell

Reputation: 56694

def copyLines(infname, outfname, lines):
    lines = list(set(lines))   # remove duplicates
    lines.sort(reverse=True)
    with open(infname, 'r') as inf, open(outfname, 'w') as outf:
        try:
            i = 1
            while lines:
                seek = lines.pop()
                while i<seek:
                    inf.next()
                    i += 1
                outf.write(inf.next())
                i += 1
        except StopIteration:  # hit end of file
            pass

def main():
    copyLines('C:/.../list.txt', 'C:/.../output.txt', range(3,5))

if __name__=="__main__":
    main()

Note that this will quit as soon as it runs out of desired lines.

Upvotes: 0

Chris Pickett
Chris Pickett

Reputation: 2822

The best way to find out which is fastest is to test it!

In this code I'm assuming that you care about the value of the line, and not which line number.

import timeit

def test_comprehension():
  input = open('list.txt')
  output = open('output.txt','w')
  [output.write(x) for x in input if int(x) > 2 and int(x) < 5]

def test_forloop():
    input = open('list.txt')
    output = open('output.txt','w')

    for x in input:
        if int(x) > 2 and int(x) < 5:
            output.write(x)

if __name__=='__main__':
    times = 10000

    from timeit import Timer
    t = Timer("test_comprehension()", "from __main__ import test_comprehension")
    print "Comprehension: %s" % t.timeit(times)

    t = Timer("test_forloop()", "from __main__ import test_forloop")
    print "For Loop: %s" % t.timeit(times)

In this I'm just setting up a couple functions, one that does it with a list comprehension, and another that does it as a forloop. The timeit module runs small bits of code the number of times you specify, times it and returns the time it took to run. So if you run the above code you'll get an output of something along the lines of:

Comprehension: 0.957081079483 For Loop: 0.956691980362

Depressingly enough, it's about the same either way.

Upvotes: 0

koblas
koblas

Reputation: 27118

You asked specifically about generators vs. list comprehension, but in general there are a few was to approach the problem.

The generator version:

input  = open('input.txt', 'r')
output = open('output.txt', 'w')

def gen() :
    for line in input :
        yield "FOO " + line

for l in gen() :
    output.write(l)

List comprehension:

output.writelines("FOO " + line for line in input)

Iterator style:

class GenClass(object) :
    def __init__(self, _in) :
        self.input = _in

    def __iter__(self):
        return self

    def next(self) :
        line = self.input.readline()
        if len(line) == 0 :
            raise StopIteration
        return "FOO " + line

output.writelines(GenClass(input))

Thoughts:

  • List comprehension is going to have everything in memory
  • List comprehension will restrict the amount of code (functions are oneline)
  • Generator is more flexible in coding practices
  • Iterator style, gives you probably the most flexibility
  • Slightly higher initialization cost (object)

Upvotes: 0

Jeffrey Bauer
Jeffrey Bauer

Reputation: 14090

Not faster, but if you want to use a list comprehension:

    output.writelines([line for (x, line) in enumerate(input) if 1 < x < 4])

This assumes you're using the actual line count of the file position and not the read value in the file (which, judging by your assignment of x, is true).

Upvotes: 1

user35147863
user35147863

Reputation: 2605

If you want to do it with a generator:

output.writelines(line for line in input if 2 < int(line) < 5)

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799430

No need for a list comprehension.

output.write(''.join(itertools.islice(inputfile, 2, 4))

Upvotes: 6

Related Questions