sandeepd
sandeepd

Reputation: 515

how to keep memcache and datastore in sync

suppose I have million users registered with my app. now there's a new user, and I want to show him who all in his contacts have this app installed. A user can have many contacts, let's say 500. now if I go to get an entity for each contact from datastore then it's very time and money consuming. memcache is a good option, but I've to keep it in sync for that Kind. I can get dedicated memcache for such a large data, but how do I sync it? my logic would be, if it's not there in memcache, assume that that contact is not registered with this app. A backend module with manual scaling can be used to keep both in sync. But I don't know how good this design is. Any help will be appreciated.

Upvotes: 0

Views: 468

Answers (3)

Gwyn Howell
Gwyn Howell

Reputation: 5424

This is not how memcache is designed to be used. You should never rely on memcache. Keys can drop at any time. Therefore, in your case, you can never be sure if a contact exists or not.

I don't know what your problem with datastore is? Datastore is designed to read data very fast - take advantage of it.

When new users install your app, create a lookup entity with the phone number as the key. You don't necessarily need any other properties. Something like this:

Entity contactLookup = new Entity("ContactLookup", "somePhoneNumber");
datastore.put(contactLookup);

That will keep a log of who's got the app installed.

Then, to check which of your users contacts are already using your app, you can create an array of keys out of the phone numbers from the users address book (with their permission of course!), and perform a batch get. Something like this:

Set<Key> keys = new HashSet<Key>();

for (String phoneNumber : phoneNumbers)
    keys.add(KeyFactory.createKey("ContactLookup", phoneNumber));

Map<Key, Entity> entities = datastore.get(keys);

Now, entities will be those contacts that have your app installed.

You may need to batch the keys to reduce load. The python api does this for you, but not sure about the java apis. But even if your users has 500 contacts, it's only 5 queries (assuming batches of 100).

Side note: you may want to consider hashing phone numbers for storage.

Upvotes: 3

Andrei Volgin
Andrei Volgin

Reputation: 41089

Memcache is a good option to reduce costs and improve performance, but you should not assume that it is always available. Even a dedicated Memcache may fail or an individual record can be evicted. Besides, all this synchronization logic will be very complicated and error-prone.

You can use Memcache to indicate if a contact is registered with the app, in which case you do not have to check the datastore for that contact. But I would recommend checking all contacts not found in Memcache in the Datastore.

Verifying if a record is present in a datastore is fast and inexpensive. You can use .get(java.lang.Iterable<Key> keys) method to retrieve the entire list with a single datastore call.

You can further improve performance by creating an entity with no properties for registered users. This way there will be no overhead in retrieving these entities.

Upvotes: 1

Patrice
Patrice

Reputation: 4692

Since you don't use python and therefore don't have access to NDB, the suggestion would be to, when you add a user, add him to memcache and create an async query (or a task queue job) to push the same data to your datastore. Like that memcache gets pushed first, and then eventually the datastore follows. They'll always be in sync.

Then all you need to do is to first query your memcache when you do "gets" (because memcache is always in sync since you push there first), and if memcache returns empty (being volatile and whatnot), then query the actual datastore to "re fill" memcache

Upvotes: 0

Related Questions