Reputation: 807
I have the following code for producing a big text file:
d = 3
n = 100000
f = open("input.txt",'a')
s = ""
for j in range(0, d-1):
s += str(round(random.uniform(0,1000), 3))+" "
s += str(round(random.uniform(0,1000), 3))
f.write(s)
for i in range(0, n-1):
s = ""
for j in range(0, d-1):
s += str(round(random.uniform(0,1000), 3))+" "
s += str(round(random.uniform(0,1000), 3))
f.write("\n"+s)
f.close()
But it seems to be pretty slow to even generate 5GB of this.
How can I make it better? I wish the output to be like:
796.802 691.462 803.664
849.483 201.948 452.155
144.174 526.745 826.565
986.685 238.462 49.885
137.617 416.243 515.474
366.199 687.629 423.929
Upvotes: 1
Views: 5510
Reputation: 1
I guess its better if you want to use a infinite loop and want to make a so big file without limitation the better is use like that
import random
d = 3
n = 1000
f = open('input.txt', 'w')
for i in range(10**9):
nums = [str(round(random.uniform(0, 1000), 3)) for j in range(d)]
f.write(' '.join(nums))
f.write('\n')
f.close()
The code will not stopped while you click on ctr-c
Upvotes: 0
Reputation: 13576
Well, of course, the whole thing is I/O bound. You can't output the file faster than the storage device can write it. Leaving that aside, there are some optimizations that could be made.
Your method of building up a long string from several shorter strings is
suboptimal. You're saying, essentially, s = s1 + s2
. When you tell
Python to do this, it concatenates two string objects to make a new
string object. This is slow, especially when repeated.
A much better way is to collect the individual string objects in a list
or other iterable, then use the join
method to run them together. For
example:
>>> ''.join(['a', 'b', 'c'])
'abc'
>>> ', '.join(['a', 'b', 'c'])
'a, b, c'
Instead of n-1 string concatenations to join n strings, this does the whole thing in one step.
There's also a lot of repeated code that could be combined. Here's a cleaner design, still using the loops.
import random
d = 3
n = 1000
f = open('input.txt', 'w')
for i in range(n):
nums = []
for j in range(d):
nums.append(str(round(random.uniform(0, 1000), 3)))
s = ' '.join(nums)
f.write(s)
f.write('\n')
f.close()
A cleaner, briefer, more Pythonic way is to use a list comprehension:
import random
d = 3
n = 1000
f = open('input.txt', 'w')
for i in range(n):
nums = [str(round(random.uniform(0, 1000), 3)) for j in range(d)]
f.write(' '.join(nums))
f.write('\n')
f.close()
Note that in both cases, I wrote the newline separately. That should be faster than concatenating it to the string, since I/O is buffered anyway. If I were joining a list of strings without separators, I'd just tack on a newline as the last string before joining.
As Daniel's answer says, numpy is probably faster, but maybe you don't want to get into numpy yet; it sounds like you're kind of a beginner at this point.
Upvotes: 2
Reputation: 85432
This could be a bit faster:
nlines = 100000
col = 3
for line in range(nlines):
f.write('{} {} {}\n'.format(*((round(random.uniform(0,1000), 3))
for e in range(col))))
or use string formatting:
for line in range(nlines):
numbers = [random.uniform(0, 1000) for e in range(col)]
f.write('{:6.3f} {:6.3f} {:6.3f}\n'.format(*numbers))
Upvotes: 1
Reputation: 42748
Using numpy is probably faster:
import numpy
d = 3
n = 100000
data = numpy.random.uniform(0, 1000,size=(n,d))
numpy.savetxt("input.txt", data, fmt='%.3f')
Upvotes: 2