little star
little star

Reputation: 81

How to resolve the Python List looping performance below

I have a Python code below and it hits out of memory error for 100Mil loop append. Java with same code doesn't have this issue at all without any tuning.

  1. Anyway to tune using the python command or like Java Hotspot JVM Command?

  2. Anyway to tune using coding way to make it run faster and utilize lesser memory.


import datetime;

mylist = []

before = datetime.datetime.now()

for _ in range(100000000):
    mylist.append(datetime.datetime.now())

print("List length-->" , len(mylist))     
   
after = datetime.datetime.now()

print ('Python time taken in seconds--->', (after - before).seconds)

Post notes:

Memory leak detection on this "datetime.datetime.now()"

Sharing my java code here. It works more than 10 times faster without JVM tuning yet and process completed in about 6 seconds .

Anyway, Java does a much better garbage collection job than Python. Normally Java won't crash in this kinda simple operation since 20 years back. https://www.snaplogic.com/glossary/python-vs-java-performance

Note: Change from System.currentTimeMillis() to New Date() doesn't make different.

package demo;

import java.util.ArrayList;
import java.util.List;

public class Performance {

    public static void main(String[] args) {
        List<Long> mylist = new ArrayList<Long>();

        long before = System.currentTimeMillis();

        for (int i = 0; i < 100000000; i++) {
            mylist.add(System.currentTimeMillis());

        }

        long after = System.currentTimeMillis();
        System.out.println("Java time taken in miliseconds--->" + (after - before) );

    }

}

Upvotes: 0

Views: 83

Answers (2)

Booboo
Booboo

Reputation: 44148

Could you use a generator expression? You cannot take the length of such an expression as the values are only generated as you iterate through the expression (and thus the memory requirements are extremely low). Here is a demo:

import datetime;
import time


before = datetime.datetime.now()
mylist = (datetime.datetime.now() for _ in range(100000000))
after = datetime.datetime.now()

# the following is problematic
#print("List length-->" , len(mylist))     
   
print ('Python time taken in seconds--->', (after - before).seconds)

#get first 5 datetimes:
n = 0
for dt in mylist:
    print(dt)
    n += 1
    if n == 5:
        break

#get next 5 datetimes with sleeping:
time.sleep(1)
n = 0
for dt in mylist:
    print(dt)
    n += 1
    if n == 5:
        break
    time.sleep(1)

Prints:

Python time taken in seconds---> 0
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:56.054372
2020-11-23 10:06:57.054869
2020-11-23 10:06:58.055935
2020-11-23 10:06:59.056067
2020-11-23 10:07:00.056201

Of course, you might as well just call datetime.datetime.now() whenever you want a new value rather than using a generator expression for this particular case. But the above shows the usefulness of generator expressions in general.

Upvotes: 2

Hadrian
Hadrian

Reputation: 927

because ints representing the unix epoch timestamp use less memory than datetime.datetime objects

>>> sys.getsizeof(datetime.datetime.now())
48
>>> sys.getsizeof(time.time())
24

you could do this:

for _ in range(100000000):
    mylist.append(time.time())

Upvotes: 1

Related Questions