LOTRFan
LOTRFan

Reputation: 11

Python performance slows with increasing memory usage

I have the following function that coverts an image into a list of hash values (using PIL):

def _GetImageHash(image):
  st = time.time()
  image_list = list(image.getdata())
  (columns, rows) = image.size
  hash_vals = [0]*rows
  for i in xrange(0,rows):
    hash_vals[i] = hash(tuple(image_list[i*columns:(i+1)*columns]))
  print "_GetImageHash time taken: ", time.time() - st
  return hash_vals, image_list

In another function, I call this method over many image files, and store the resulting lists. However, I observe that the time it takes for this function to compute the hash values increases significantly with each call. If I change the order of the calls, it does not change this observed behavior (all the images are of the same size, so there should not be significant difference in the time this function takes to compute the hash values). In fact if I do:

image1_hash, image1_list = _GetImageHash(image1)
image2_hash, image2_list = _GetImageHash(image1)
image3_hash, image3_list = _GetImageHash(image1)
image4_hash, image4_list = _GetImageHash(image1)
image5_hash, image5_list = _GetImageHash(image1) ...

The times reported are like this:

_GetImageHash time taken:  0.672996044159
_GetImageHash time taken:  1.40435290337
_GetImageHash time taken:  2.10946083069
_GetImageHash time taken:  2.84965205193
_GetImageHash time taken:  3.57753205299
_GetImageHash time taken:  4.71754598618
_GetImageHash time taken:  5.10348200798
_GetImageHash time taken:  5.83603620529
_GetImageHash time taken:  6.57408809662
_GetImageHash time taken:  7.30649399757
_GetImageHash time taken:  7.26073002815
_GetImageHash time taken:  7.94218182564

It seems that this is happening because i am storing the lists. But why does the performance suffer due to memory usage here? Can something be done to address this, so that memory usage does not have such a drastic impact on time performance?

Upvotes: 1

Views: 132

Answers (2)

Benoit Bertholon
Benoit Bertholon

Reputation: 765

you might try disabling the garbage collector,

import gc
gc.disable()

#your code

gc.enable()

Upvotes: 1

AFoglia
AFoglia

Reputation: 8148

I don't know how big your images are, but if you think it's a memory issue, I would start by checking how much memory the process is using. You can either find a recipe online to call in process (such as here), or just track the memory usage in your OS's process monitor.

If it is memory usage, the first thing I would do would be to replace the list version image_list with something more compact. Numpy arrays would be ideal, but even the standard library module array should help.

I say should, because if the values in image_list are all small integers (below about 256), then Python is using a smaller, compact storage, and not reallocating those ints. It is reallocating pointers in the list to hold them though. If you make your array hold 4 (8) byte values, then that's going to be the same as the pointer size used by the list on a 32 (64) bit system. I haven't used PIL, so I'm unfamiliar with the return of pil.Image.getdata.

Upvotes: 0

Related Questions