maymay
maymay

Reputation: 33

Writing large number of large lists to a file

I wanted to test the efficiency of an algorithm, so I decided to make a text file which contains all the input test cases. My main goal is to make two non intersecting lists which have max size of 10^5. So the simplest way was to make two lists.

So the first set of lists will be

list1: -100000

list2: 0

Second set will be

list1: -100000 -99999

list2: 0 1

and it will proceed further till 10^5th iteration

list1: -100000 -99999 ... -1

list2: 0 1 ... 99999

So I wrote this code

#!/usr/bin/python3

ip = open("input.dat", "w+")

for length in range(1, 10**5+1):
    arr1 = [var1 for var1 in range(length)]
    arr2 = [var2-(10**5) for var2 in range(length)]
    ip.writelines(["%s " % item  for item in arr1])
    ip.writelines("\n")
    ip.writelines(["%s " % item  for item in arr2])
    ip.writelines("\n\n")

ip.close()

But this is highly inefficient, and would take a long time to process. Is there any efficient way to do the same?

Upvotes: 3

Views: 559

Answers (2)

Pharoah Jardin
Pharoah Jardin

Reputation: 134

This code is more efficient but still useless for the 10**5 case.

#!/usr/bin/python3

ip = open("input.dat", "w+")

N = 10**4

string_arr1 = ""
string_arr2 = ""
for lenght in range(0, N):
    string_arr1 += "%s " % lenght
    string_arr2 += "%s " % (lenght-N)
    ip.write(string_arr1 + "\n")
    ip.write(string_arr2 + "\n\n")

ip.close()

On my machine, it runs under 2 seconds.

Edit: corrected a few bugs.

Upvotes: 2

chqrlie
chqrlie

Reputation: 145307

It is unclear whether you want arrays with 1000 or 100000 items: 105 is 100000, not 10000 as mentioned in your question.

You do not need to create intermediary arrays. Just iterate on integer variables. I get a 23% reduction in elapsed time (29.1s vs 37.7s) for 10000 elements with this code:

#!/usr/bin/python

# iterating to 10**4 generates a 538,995,000 byte file (539MB)
# iterating to 10**5 would produce more than 100x that much
max = 10000

ip = open("input.dat", "w+")

for length in range(1, max+1):
    ip.writelines(["%s " % item  for item in range(length)])
    ip.writelines("\n")
    ip.writelines(["%s " % item  for item in range(-max, length-max)])
    ip.writelines("\n\n")

ip.close()

The time is spent converting the array items to strings and writing to the file. Creating the arrays is quicker, especially when the creation is delayed as is the case here. ip.writelines takes an iterable object, which might not be constructed at all.

Upvotes: 1

Related Questions