d.learious
d.learious

Reputation: 103

Python Imaging Library loop performance getting slower with repetition

I have written a simple function to resize an image from 1500x2000px to 900x1200px.

def resizeImage(file_list):
    if file_list:
        if not os.path.exists('resized'):
            os.makedirs('resized')
        i = 0
        for files in file_list:
            i += 1
            im = Image.open(files)
            im = im.resize((900,1200),Image.ANTIALIAS)
            im.save('resized/' + files, quality=90)
        print str(i) + " files resized successfully" 
    else:
        print "No files to resize"

i used the timeit function to measure how long it takes to run with some example images. Here is an example of the results.

+---------------+-----------+---------------+---------------+---------------+
| Test Name     | No. files |      Min      |      Max      |    Average    |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal |     10    | 5.25000018229 | 5.31371171493 | 5.27186083393 |
+---------------+-----------+---------------+---------------+---------------+

But if i repeat the test the times gradually keep increasing i.e.

+---------------+-----------+---------------+---------------+---------------+
| Test Name     | No. files |      Min      |      Max      |    Average    |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal |     10    | 5.36660298734 | 5.57177596057 | 5.45903467485 |
+---------------+-----------+---------------+---------------+---------------+

+---------------+-----------+---------------+---------------+---------------+
| Test Name     | No. files |      Min      |      Max      |    Average    |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal |     10    | 5.58739076382 | 5.76515489024 | 5.70014196601 |
+---------------+-----------+---------------+---------------+---------------+

+---------------+-----------+---------------+---------------+-------------+
| Test Name     | No. files |      Min      |      Max      |   Average   |
+---------------+-----------+---------------+---------------+-------------+
| Resize normal |     10    | 5.77366483042 | 6.00337707034 | 5.891541538 |
+---------------+-----------+---------------+---------------+-------------+

+---------------+-----------+---------------+--------------+---------------+
| Test Name     | No. files |      Min      |     Max      |    Average    |
+---------------+-----------+---------------+--------------+---------------+
| Resize normal |     10    | 5.91993466793 | 6.1294756299 | 6.03516199948 |
+---------------+-----------+---------------+--------------+---------------+

This is how im running the test.

def resizeTest(repeats):
    os.chdir('C:/Users/dominic/Desktop/resize-test')
    files = glob.glob('*.jpg')

    t = timeit.Timer(
        "resizeImage(filess)", 
        setup="from imageToolkit import resizeImage; import glob; filess = glob.glob('*.jpg')"
    )   
    time = t.repeat(repeats, 1)

    results = {
        'name': 'Resize normal',
        'files': len(files),
        'min': min(time),
        'max': max(time),
        'average': averageTime(time)
    }
    resultsTable(results)

I have moved the images that are processed from my mechanical hard drive to the SSD and the issue persists. I have also checked the Memory being used and it stays pretty steady through all the runs, topping out at around 26Mb, the process uses around 12% of one core of the CPU.

Going forward i like to experiment with the multiprocessing library to increase the speed, but i'd like to get to the bottom of this issue first.

Would this be an issue with my loop that causes the performance to degrade?

Upvotes: 0

Views: 1323

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121168

The im.save() call is slowing things down; repeated writing to the same directory is perhaps thrashing OS disk caches. When you removed the call, the OS was able to optimize the image read access times via disk caches.

If your machine has multiple CPU cores, you can indeed speed up the resize process, as the OS will schedule multiple sub-processes across those cores to run each resize operation. You'll not get a linear performance improvement, as all those processes still have to access the same disk for both reads and writes.

Upvotes: 1

Related Questions