Reputation: 8412
I'm writing a program that takes a string and compute all possible repeated permutations from this string. I'll show some fragments of my code, I would be grateful if someone can point me how to improved the speed when sending the data to a file.
Sending the output to stdout took about 12 seconds to write 531,441 lines (3mb)
import itertools
for word in itertools.product(abcdefghi,repeat = 6):
print(word)
Then I tried sending the output to a file instead of stdout, and this took a roughly around 5 minutes.
import itertools
word_counter=0
for word in itertools.product(abcdefghi,repeat = 6):
word_counter=word_counter+1
if word_counter==1:
open('myfile', 'w').write(word)
else:
open('myfile', 'a').write(word)
word_counter
keep track of the number of repeated permutations as the function is looping. When word_counter
is 1 the program creates the file and afterwards append the data to the file when word_counter is greater than 1.
I use a program on the web to do this and I found the program took the same time when printing the data to a terminal and this same web prgoram took about 3 seconds to output these combinations to a file while my program took 5 minutes to output the data to a file!
I also tried running my program and redirecting output to a file in a bash terminal, and this took the same time (3 sec)!
'myprog' > 'output file'
Upvotes: 2
Views: 2670
Reputation: 28963
You are reopening the file for every write, try not doing that:
import itertools
output = open('myfile', 'w')
for word in itertools.product(abcdefghi, repeat=6):
output.write(word + '\n')
[Edit with explanation] When you're working with 530,000 words, even making something a tiny bit slower for each word, adds up to a LOT slower for the whole program.
My way, you do one piece of setup work (open the file) and put it in memory, then go through 500,000 words and save them, then do one piece of tidy up work (close the file). That's why the file is saved in a variable - so you can set it up once, and use it again and again.
Your way, you do almost no setup work first, then you add one to the counter 500,000 times, check the value of the counter 500,000 times, branch this way or that 500,000 times, open the file and force Windows (or Linux) to check your permissions every time, put it in memory 500,000 times, write to it 500,000 times, stop using the file you opened (because you didn't save it) so it falls into the 'garbage' and gets tidied up - 500,000 times, and then finish.
The amount of work is small each time, but when you do them all so many times, it adds up.
Upvotes: 5
Reputation: 740
The same as previous answers but with a context!
import itertools
with open('myfile', 'w') as output:
for word in itertools.product(abcdefghi, repeat=6):
output.write(word + '\n')
Context have the benefits of cleaning up after themselves and handling errors.
Upvotes: 1