John Lunky
John Lunky

Reputation: 11

Solution to storing 300MB in memory for Google App Engine

I am using Google App Engine in Python. I have 5000 people in my database. The entire list of 5000 people objects takes up 300 MB of memory.

I have been trying to store this in memory using blobcache, a module written [here][1].

I am running into pickle "OutOfMemory" issues, and am looking for a solution that involves storing these 5000 objects into a database, and then retrieving them all at once.

My person model looks like this.

class PersonDB(db.Model):
    serialized = db.BlobProperty()
    pid = db.StringProperty()

Each person is an object that has many attributes and methods associated with it, so I decided to pickle each person object and store it as the serialized field. The pid just allows me to query the person by their id. My person looks something like this

class Person():
    def __init__(self, sex, mrn, age):
       self.sex = sex;
       self.age = age; #exact age
       self.record_number = mrn;
       self.locations = [];

    def makeAgeGroup(self, ageStr):
       ageG = ageStr
       return int(ageG)

    def addLocation(self, healthdistrict):
        self.locations.append(healthdistrict) 

When I store all 5000 people at once into my database, I get a Server 500 error. Does anyone know why? My code for this is as follows:

   #People is my list of 5000 people objects
def write_people(self, people):
    for person in people:
        personDB = PersonDB()
        personDB.serialized = pickle.dumps(person)
        personDB.pid = person.record_number
        personDB.put()

How would I retrieve all 5000 of these objects at once in my App Engine method?

My idea is to do something like this

def get_patients(self):
    #Get my list of 5000 people back from the database
    people_from_db = db.GqlQuery("SELECT * FROM PersonDB")
    people = []
    for person in people_from_db:
        people.append(pickle.loads(person.serialized))

Thanks for the help in advance, I've been stuck on this for a while!!

Upvotes: 1

Views: 633

Answers (3)

PanosJee
PanosJee

Reputation: 3866

You would also check a project performance appengine https://github.com/ocanbascil/PerformanceEngine

Upvotes: 0

Jan Z
Jan Z

Reputation: 612

For this size of data why not use a blobstore and memcache?

In terms of performance (from highest to lowest):

  • local instance memory (your data set is too large)
  • memcache (partition your data into several keys and you should be fine, and it's very fast!)
  • blobstore + memcache (persist to blobstore rather than DB)
  • db + memcache (persist to db)

Check out the Google IO videos from this year, there is a great one on using the blobstore for exactly this sort of thing. There is a significant performance (and cost) penalty associated with the DB for some use cases.

(for the pedantic readers, the read performance of the last three will be effectively the same, but there are significant differences in write time/cost)

Upvotes: 0

recursive
recursive

Reputation: 86084

You should not have all 5000 users in memory at once. Only retrieve the one you need.

Upvotes: 2

Related Questions