Reputation: 4643
f = open('wl4.txt', 'w')
hh = 0
######################################
for n in range(1,5):
for l in range(33,127):
if n==1:
b = chr(l) + '\n'
f.write(b)
hh += 1
elif n==2:
for s0 in range(33, 127):
b = chr(l) + chr(s0) + '\n'
f.write(b)
hh += 1
elif n==3:
for s0 in range(33, 127):
for s1 in range(33, 127):
b = chr(l) + chr(s0) + chr(s1) + '\n'
f.write(b)
hh += 1
elif n==4:
for s0 in range(33, 127):
for s1 in range(33, 127):
for s2 in range(33,127):
b = chr(l) + chr(s0) + chr(s1) + chr(s2) + '\n'
f.write(b)
hh += 1
######################################
print "We Made %d Words." %(hh)
######################################
f.close()
So, is there any method to make it faster?
Upvotes: 2
Views: 1315
Reputation: 161
Do you need all of the words sorted by their length? If you can mingle the lengths together, you can improve slightly on John Kugelman's answer like this:
f = open("wl4.txt", "w")
chars = [chr(c) for c in range(33, 127)]
c = len(chars)
count = c + c*c + c**3 + c**4
for c0 in chars:
print >>f, c0
for c1 in chars:
s1 = c0 + c1
print >>f, s1
for c2 in chars:
s2 = s1 + c2
print >>f, s2
for c3 in chars:
print >>f, s2 + c3
print "We Made %d Words." % count
Directly calculating hh instead of all of the incrementing is also a big win (about 15% on this laptop). There's also an improvement from using print over f.write, though i have no idea why that's the case. This version runs in about 39 seconds for me.
Upvotes: 0
Reputation: 8966
When doing operations that involve iterating a lot, a good place to start is the itertools
package.
In this case it looks like you want the product
function. Which gives you the:
cartesian product, equivalent to a nested for-loop
So to get a list of the "words" you are creating:
from itertools import product
chars = map(chr, xrange(33,127)) # create a list of characters
words = [] # this will be the list of words
for length in xrange(1, 5): # length is the length of the words created
words.extend([''.join(x) for x in product(chars, repeat=length)])
# instead of keeping a separate counter, hh, we can use the len function
print "We Made %d Words." % (len(words))
f = open('wl4.txt', 'w')
f.write('\n'.join(words)) # write one word per line
f.close()
As a result we get the result that your script gives us. And since itertools
is implemented in c
, it is also faster.
Edit:
Per John Machin's very astute comment about the memory usage, here's updated code that doesn't give me a memory error when I run it on the whole range(33, 127).
from itertools import product
chars = map(chr, xrange(33,127)) # create a list of characters
f_words = open('wl4.txt', 'w')
num_words = 0 # a counter (was hh in OPs code)
for length in xrange(1, 5): # length is the length of the words created
for char_tup in product(chars, repeat=length):
f_words.write(''.join(char_tup) + '\n')
num_words += 1
f.close()
print "We Made %d Words." % (num_words)
This runs in about 4 minutes (240 seconds) on my machine.
Upvotes: 2
Reputation: 82924
Further significant improvements are possible.
The following script file demonstrates these, using (for brevity) only the size 4 loop (which takes up well over 90% of the time).
method 0: the OP's original code
method 1: John Kugleman's solution
method 2: (1) and move some string concatenation out of inner loops
method 3: (2) and put the code inside a function -- accessing local variables is MUCH faster than global variables. Any script can do this. Many scripts should do this.
method 4: (3) and accumulate strings in a list then join them and write them. Note that this uses memory like you may not believe. My code doesn't attempt to do it for the whole file, because (127 - 33) ** 4 is 78M strings. On a 32-bit box, that's 78 * 4 = 312Mb for the list alone (ignoring unused memory at the end of the list), plus 78 * 28 = 2184 Mb for the str objects (sys.getsizeof("1234") produces 28), plus 78 * 5 = 390 Mb for the join result. You just blew your user address space or your ulimit or something else blowable. Or if you have 1 Gb of real memory of which 128Mb has been snarfed by the video driver, but enough swap space, you have time for lunch (if running a particular OS, dinner as well).
method 5: (4) and don't ask the list for the whereabouts of its append attribute 78 million times :-)
Here is the script file:
import time, sys
time_function = time.clock # Windows; time.time may be better on *x
ubound, which = map(int, sys.argv[1:3])
t0 = time_function()
if which == 0:
### original ###
f = open('wl4.txt', 'w')
hh = 0
n = 4
for l in range(33, ubound):
if n == 1:
pass
elif n == 2:
pass
elif n == 3:
pass
elif n == 4:
for s0 in range(33, ubound):
for s1 in range(33, ubound):
for s2 in range(33,ubound):
b = chr(l) + chr(s0) + chr(s1) + chr(s2) + '\n'
f.write(b)
hh += 1
f.close()
elif which == 1:
### John Kugleman ###
f = open('wl4.txt', 'w')
chars = [chr(c) for c in range(33, ubound)]
hh = 0
for l in chars:
for s0 in chars:
for s1 in chars:
for s2 in chars:
b = l + s0 + s1 + s2 + '\n'
f.write(b)
hh += 1
f.close()
elif which == 2:
### JohnK, saving + ###
f = open('wl4.txt', 'w')
chars = [chr(c) for c in range(33, ubound)]
hh = 0
for L in chars: # "L" as in "Legible" ;-)
for s0 in chars:
b0 = L + s0
for s1 in chars:
b1 = b0 + s1
for s2 in chars:
b = b1 + s2 + '\n'
f.write(b)
hh += 1
f.close()
elif which == 3:
### JohnK, saving +, function ###
def which3func():
f = open('wl4.txt', 'w')
chars = [chr(c) for c in range(33, ubound)]
nwords = 0
for L in chars:
for s0 in chars:
b0 = L + s0
for s1 in chars:
b1 = b0 + s1
for s2 in chars:
b = b1 + s2 + '\n'
f.write(b)
nwords += 1
f.close()
return nwords
hh = which3func()
elif which == 4:
### JohnK, saving +, function, linesep.join() ###
def which4func():
f = open('wl4.txt', 'w')
chars = [chr(c) for c in range(33, ubound)]
nwords = 0
for L in chars:
accum = []
for s0 in chars:
b0 = L + s0
for s1 in chars:
b1 = b0 + s1
for s2 in chars:
accum.append(b1 + s2)
nwords += len(accum)
accum.append("") # so that we get a final newline
f.write('\n'.join(accum))
f.close()
return nwords
hh = which4func()
elif which == 5:
### JohnK, saving +, function, linesep.join(), avoid method lookup in loop ###
def which5func():
f = open('wl4.txt', 'w')
chars = [chr(c) for c in range(33, ubound)]
nwords = 0
for L in chars:
accum = []; accum_append = accum.append
for s0 in chars:
b0 = L + s0
for s1 in chars:
b1 = b0 + s1
for s2 in chars:
accum_append(b1 + s2)
nwords += len(accum)
accum_append("") # so that we get a final newline
f.write('\n'.join(accum))
f.close()
return nwords
hh = which5func()
else:
print "Bzzzzzzt!!!"
t1 = time_function()
print "Method %d made %d words in %.1f seconds" % (which, hh, t1 - t0)
Here are some results:
C:\junk\so>for %w in (0 1 2 3 4 5) do \python26\python wl4.py 127 %w
C:\junk\so>\python26\python wl4.py 127 0
Method 0 made 78074896 words in 352.3 seconds
C:\junk\so>\python26\python wl4.py 127 1
Method 1 made 78074896 words in 183.9 seconds
C:\junk\so>\python26\python wl4.py 127 2
Method 2 made 78074896 words in 157.9 seconds
C:\junk\so>\python26\python wl4.py 127 3
Method 3 made 78074896 words in 126.0 seconds
C:\junk\so>\python26\python wl4.py 127 4
Method 4 made 78074896 words in 68.3 seconds
C:\junk\so>\python26\python wl4.py 127 5
Method 5 made 78074896 words in 60.5 seconds
Update in response to OP's questions
""" When I try to add for loops, i got a memory error for accum_append.. what is the problem ??"""
I don't know what the problem is; I can't read your code at this distance. Guess: If you are trying to do length == 5, you have probably got the accum
initialisation and writing bits in the wrong place, and accum
is trying to grow beyond the capacity of your system's memory (as I hoped I'd explained earlier).
"""Now Method 5 is the fastest one, but its make a word tell length 4.. how could i make how much i want ?? :)"""
You have two choices: (1) you continue to use nested for loops (2) you look at the answers that don't use nested for loops, with the length specified dynamically.
Methods 4 and 5 got speedups by using accum
but the manner of doing that was tailored to the exact knowledge of how much memory would be used.
Below are 3 more methods. 101 is tgray's method with no extra memory use. 201 is Paul Hankin's method (plus some write-to-file code) similarly with no extra memory use. These two methods are of about the same speed and are within sight of method 3 speedwise. They both allow dynamic specification of the desired length.
Method 102 is tgray's method with a fixed 1Mb buffer -- it attempts to save time by reducing the number of calls to f.write() ... you may wish to experiment with the buffer size. You could create an orthogonal 202 method if you wished. Note that tgray's method uses itertools.product
for which you'll need Python 2.6, whereas Paul Hankin's method uses generator expressions which have been around for a while.
elif which == 101:
### tgray, memory-lite version
def which101func():
f = open('wl4.txt', 'w')
f_write = f.write
nwords = 0
chars = map(chr, xrange(33, ubound)) # create a list of characters
length = 4 #### length is a variable
for x in product(chars, repeat=length):
f_write(''.join(x) + '\n')
nwords += 1
f.close()
return nwords
hh = which101func()
elif which == 102:
### tgray, memory-lite version, buffered
def which102func():
f = open('wl4.txt', 'w')
f_write = f.write
nwords = 0
chars = map(chr, xrange(33, ubound)) # create a list of characters
length = 4 #### length is a variable
buffer_size_bytes = 1024 * 1024
buffer_size_words = buffer_size_bytes // (length + 1)
words_in_buffer = 0
buffer = []; buffer_append = buffer.append
for x in product(chars, repeat=length):
words_in_buffer += 1
buffer_append(''.join(x) + '\n')
if words_in_buffer >= buffer_size_words:
f_write(''.join(buffer))
nwords += words_in_buffer
words_in_buffer = 0
del buffer[:]
if buffer:
f_write(''.join(buffer))
nwords += words_in_buffer
f.close()
return nwords
hh = which102func()
elif which == 201:
### Paul Hankin (needed output-to-file code added)
def AllWords(n, CHARS=[chr(i) for i in xrange(33, ubound)]):
#### n is the required word length
if n == 1: return CHARS
return (w + c for w in AllWords(n - 1) for c in CHARS)
def which201func():
f = open('wl4.txt', 'w')
f_write = f.write
nwords = 0
for w in AllWords(4):
f_write(w + '\n')
nwords += 1
f.close()
return nwords
hh = which201func()
Upvotes: 8
Reputation:
Here's a short recursive solution.
def AllWords(n, CHARS=[chr(i) for i in xrange(33, 127)]):
if n == 1: return CHARS
return (w + c for w in AllWords(n - 1) for c in CHARS)
for i in xrange(1, 5):
for w in AllWords(i):
print w
PS: is it an error that character 127 is excluded?
Upvotes: 0
Reputation: 5949
How about this it workes with arbitary word lengths: (Password generator?)
f = open('wl4.txt', 'w')
hh=0
chars = map(chr,xrange(33, 127))
def func(n, result):
if (n == 0):
f.write(result + "\n")
hh +=1
else:
for c in chars:
func(n-1, result+c)
for n in range(1, 5):
func(n,"")
######################################
print "We Made %d Words." %(hh)
######################################
f.close()
Upvotes: 1
Reputation: 361585
You can create the range(33, 127)
once and save it off. Not having to create it repeatedly cuts the runtime in half on my machine.
chars = [chr(c) for c in range(33, 127)]
...
for s0 in chars:
for s1 in chars:
for s2 in chars:
b = l + s0 + s1 + s2 + '\n'
f.write(b)
hh += 1
Upvotes: 7
Reputation: 46607
The outer loop seems pretty pointless. Why not simply:
for l in range(33,127)
.. your code for the n==1 case
for l in range(33,127)
.. your code for the n==2 case
for l in range(33,127)
.. your code for the n==3 case
for l in range(33,127)
.. your code for the n==4 case
That will be both faster and easier read.
Upvotes: 2