Reputation: 31
Hey all, I have some code to read certain lines from a file, and wanted to know if it would run faster as a list comprehension or a generator expression/function. And if it does run faster, how would the code look? Still learning Python. Thanks for your help
input = open('C:/.../list.txt', 'r')
output = open('C:/.../output.txt', 'w')
x=0
for line in input:
x = x+1
if x > 2 and x < 5:
output.write(line)
list file has
1
2
3
4
5
output in new file is
3
4
Upvotes: 1
Views: 2950
Reputation: 56694
def copyLines(infname, outfname, lines):
lines = list(set(lines)) # remove duplicates
lines.sort(reverse=True)
with open(infname, 'r') as inf, open(outfname, 'w') as outf:
try:
i = 1
while lines:
seek = lines.pop()
while i<seek:
inf.next()
i += 1
outf.write(inf.next())
i += 1
except StopIteration: # hit end of file
pass
def main():
copyLines('C:/.../list.txt', 'C:/.../output.txt', range(3,5))
if __name__=="__main__":
main()
Note that this will quit as soon as it runs out of desired lines.
Upvotes: 0
Reputation: 2822
The best way to find out which is fastest is to test it!
In this code I'm assuming that you care about the value of the line, and not which line number.
import timeit
def test_comprehension():
input = open('list.txt')
output = open('output.txt','w')
[output.write(x) for x in input if int(x) > 2 and int(x) < 5]
def test_forloop():
input = open('list.txt')
output = open('output.txt','w')
for x in input:
if int(x) > 2 and int(x) < 5:
output.write(x)
if __name__=='__main__':
times = 10000
from timeit import Timer
t = Timer("test_comprehension()", "from __main__ import test_comprehension")
print "Comprehension: %s" % t.timeit(times)
t = Timer("test_forloop()", "from __main__ import test_forloop")
print "For Loop: %s" % t.timeit(times)
In this I'm just setting up a couple functions, one that does it with a list comprehension, and another that does it as a forloop. The timeit module runs small bits of code the number of times you specify, times it and returns the time it took to run. So if you run the above code you'll get an output of something along the lines of:
Comprehension: 0.957081079483 For Loop: 0.956691980362
Depressingly enough, it's about the same either way.
Upvotes: 0
Reputation: 27118
You asked specifically about generators vs. list comprehension, but in general there are a few was to approach the problem.
The generator version:
input = open('input.txt', 'r')
output = open('output.txt', 'w')
def gen() :
for line in input :
yield "FOO " + line
for l in gen() :
output.write(l)
List comprehension:
output.writelines("FOO " + line for line in input)
Iterator style:
class GenClass(object) :
def __init__(self, _in) :
self.input = _in
def __iter__(self):
return self
def next(self) :
line = self.input.readline()
if len(line) == 0 :
raise StopIteration
return "FOO " + line
output.writelines(GenClass(input))
Thoughts:
Upvotes: 0
Reputation: 14090
Not faster, but if you want to use a list comprehension:
output.writelines([line for (x, line) in enumerate(input) if 1 < x < 4])
This assumes you're using the actual line count of the file position and not the read value in the file (which, judging by your assignment of x, is true).
Upvotes: 1
Reputation: 2605
If you want to do it with a generator:
output.writelines(line for line in input if 2 < int(line) < 5)
Upvotes: 2
Reputation: 799430
No need for a list comprehension.
output.write(''.join(itertools.islice(inputfile, 2, 4))
Upvotes: 6