Reputation: 241
I am seeking for advice, how I can improve this in terms of speed:
My Data-model:
class Events(ndb.Model):
eventid = ndb.StringProperty(required=True)
participants = ndb.StringProperty(repeated=True)
The way I try to get the data:
def GetEventDataNotCached(eventslist):
futures = []
for eventid in eventslist:
if eventid is not None:
ke = database.Events.query(database.Events.eventid == eventid)
future = ke.get_async(keys_only = True)
futures.append(future)
eventskeys = []
for future in futures:
eventkey = future.get_result()
eventskeys.append(eventkey)
data = ndb.get_multi(eventskeys)
So I get the keys async and than pass the keys to a "get_multi" - is there any other way to make that faster, as I am still not happy yet with the performance.
In the repeated property there can be up to a couple of hundred strings. There are several 10.000 rows in the Events model. In the eventslist are just a couple of dozens eventids I want to fetch.
Upvotes: 3
Views: 1179
Reputation: 349
I have found that the deserialization overhead from the protocol buffer of long lists (i.e., large repeated=True
properties) is very poor.
Have you looked at this in appstats? Do you see a large gap of whitespace where no RPC is executing after your get_multi()
? That is the deserialization overhead.
The only way I've found to overcome this is to remove the long lists and manage them in a separate model (i.e., avoid the long repeated property lists altogether), but of course, that may not be possible for your use case.
So the big question is: do you really need all the participants when you get the list of events, or can you defer that lookup in some way? E.g., it might be cheaper/faster to fetch all the events synchronously, then kick of async fetches for the participants for each event (from a different model) and combine in memory - perhaps you only need the 25 most recently registered participants or something and thus can limit to cost of your sub-queries?
Upvotes: 5
Reputation: 2459
An improvement in simplicity and execution speed but not cost could be:
data = database.Events.query(database.Events.eventid.IN(eventslist)).fetch(100)
Next step is to have eventid as the id in key, created like
event = Event(id=eventid, ...)
in which case you do
data = ndb. get_multi(ndb.Key(Event, eventid) for eventid in eventlist)
Which is faster and len(eventlist)*6 times cheaper.
Upvotes: 2