I have the following class: class C1: STORE = [] TYPE = [] ITEMS = [] PRICE = [] def __init__(self,STORE,TYPE,ITEMS,PRICE): self.STORE = STORE self.TYPE = TYPE self.ITEMS = ITEMS self.PRICE = PRICE The purpose of the class is to store all the items in different stores and their prices. Items are alphanumerically ordered, and if the item doesn't exist in a store it shows the price as 0. I am retrieving the data from a table in a database that looks like this: S1 S2 S3 .... S29000 item1 15 2 30 .... 100 item2 0 1 0 .... 5 . . . item600 30 190 10 .... 25 The lists STORE and ITEMS look like the following accordingly: STORE: ['S1','S2',...,'S29000'] ITEM: ['item1','item2',....,'item600'] For the PRICE list it is a multidimensional list which by specifying the store index and item index it would give you the price of the specified item at specified store (ex. price[0][0] will get you the price of item1 at S1 which is 15). With all this data in a class, I run 'reports' with some 'complex' computation involved. I am running into the problem that I am creating class objects from different tables and the memory usage of python reaches almost 1.8 GB according to Windows Task Manager. I know that my objects are the main reason of the memory consumption, which brings me to two questions: I 'del' every class object after I use them and gc and it appears python doesn't want to let go of the memory even after the script is done. Is there a brute force method to free it up? Are there data structures other than lists which would consume less memory and improve my performance? I noticed Numpy as being an option but I am forced to use Python 2.3 and it appears that it is not compatible with Numpy. I had previously tried to read the database every time that I wanted to compute something which would take my program to run for almost 3 hours, but now when storing the data into a classes it takes 40 minutes. So my 'client' doesn't want to go to the old way. Thanks in advance. EDIT: The original data looks like the table that I described previously, but it is only accessible through the API provided by the vendor. EDIT2: My intention is to create various objects of type C1 for multiple 'data sources.' So I would end up with 6 C1 type objects containing distinct data in each one of them EDIT3: In order to access the list of prices of items per store, the API has one function which is in the form of GetPrices('Store Name'). So It is necessary to call this function for every store. Therefore currently, my code that genereates C1 objects has one huge For Loop that calls this function for every Store.

Reputation: 327

Possible Fix to deal with Python Memory usage

I have the following class:

 class C1:
    STORE = []
    TYPE = []
    ITEMS = []
    PRICE = []


    def __init__(self,STORE,TYPE,ITEMS,PRICE):
        self.STORE = STORE
        self.TYPE = TYPE
        self.ITEMS = ITEMS
        self.PRICE = PRICE

The purpose of the class is to store all the items in different stores and their prices. Items are alphanumerically ordered, and if the item doesn't exist in a store it shows the price as 0. I am retrieving the data from a table in a database that looks like this:

           S1  S2  S3 .... S29000
item1      15   2  30 ....    100
item2       0   1   0 ....      5
.
.
.
item600     30 190 10 ....     25

The lists STORE and ITEMS look like the following accordingly:

STORE: ['S1','S2',...,'S29000'] ITEM: ['item1','item2',....,'item600']

For the PRICE list it is a multidimensional list which by specifying the store index and item index it would give you the price of the specified item at specified store (ex. price[0][0] will get you the price of item1 at S1 which is 15).

With all this data in a class, I run 'reports' with some 'complex' computation involved.

I am running into the problem that I am creating class objects from different tables and the memory usage of python reaches almost 1.8 GB according to Windows Task Manager.

I know that my objects are the main reason of the memory consumption, which brings me to two questions:

I 'del' every class object after I use them and gc and it appears python doesn't want to let go of the memory even after the script is done. Is there a brute force method to free it up?
Are there data structures other than lists which would consume less memory and improve my performance? I noticed Numpy as being an option but I am forced to use Python 2.3 and it appears that it is not compatible with Numpy.

I had previously tried to read the database every time that I wanted to compute something which would take my program to run for almost 3 hours, but now when storing the data into a classes it takes 40 minutes. So my 'client' doesn't want to go to the old way.

Thanks in advance.

EDIT: The original data looks like the table that I described previously, but it is only accessible through the API provided by the vendor.

EDIT2: My intention is to create various objects of type C1 for multiple 'data sources.' So I would end up with 6 C1 type objects containing distinct data in each one of them

EDIT3: In order to access the list of prices of items per store, the API has one function which is in the form of GetPrices('Store Name'). So It is necessary to call this function for every store. Therefore currently, my code that genereates C1 objects has one huge For Loop that calls this function for every Store.

Upvotes: 1

Answers (5)

ncoghlan

Reputation: 41556

Using Python 2.3 is going to limit your options (including excluding the option of going to 64-bit). That's also the main reason the memory is not being released back to the OS: the internal object allocator in CPython didn't gain the ability to return no longer used memory to the OS until 2.5.

If you can, try running the algorithm on 2.7 and see what gains you're able to achieve purely by using a more recent version of the interpreter (or what compatibility problems would arise in such a migration).

And, as others have suggested, optimise your data structures. Check the algorithmic complexity of the operations you perform regularly, and see if there is a way to convert O(n*n) operations to O(n*logn) and O(n) to O(logn) or O(1).

Even if the underlying data structures can't change, you may be able to use the bisect module to speed up some operations on your lists.

Upvotes: 1

Adam

Reputation: 17389

You have a 800 x 30,000 matrix. That's 24 000 000 elements per array. That's already at about 100 MB of space if they're floats, but more because of object overhead. And you have six of these beasts?

If 1.8 GB is too much then you'll have to store less. Sorry, but this is why real number crunching can be hard. Make sure you only have the data you need, and that's it.

If most of that matrix is empty then I'd suggest looking at some sparse matrix libraries. SciPy/NumPy have the most common ones, but I'm sure someone else provides something workable with Python 2.3. Maybe an old NumPy?

Upvotes: 1

Achim

Reputation: 15722

It's hard to say something without knowing more about your algorithm, but perhaps would http://docs.python.org/library/struct.html be an option? Or cyton? Pyrex?

Upvotes: 0

tkerwin

Reputation: 9769

You might try the array module, it was around in Python 2.3. Other than that, you may want to look into using a proper database for this stuff.

Upvotes: 3

Fred Foo

Reputation: 363817

Items are alphanumerically ordered [...]

With all this data in a class, I run 'reports' with some 'complex' computation involved.

Are there data structures other than lists which would consume less memory and improve my performance?

I'm just guessing at your algorithms here: linear-time searching? If so, using an OrderedDict may improve performance greatly.

That won't solve the memory issue, though; consider using a proper database package instead, e.g. SQLite + SQLAlchemy or plain old bsddb with B-trees.

Upvotes: 0

Possible Fix to deal with Python Memory usage

Answers (5)

Related Questions