user2924127
user2924127

Reputation: 6242

Google Datastore

I was planning on using MySQL for this, but Google Datastore looks like it could be a possible solution and I wouldn't have to worry about managing the server which is a big plus. I need to store one thing, messages. I was planning on having one MySQL messaging table which will contain messages from users to other users.

I am hoping I can translate somehow my MySQL message table effective into Google Datastore. My requirements are and solution using MySQL is:

A Message can only be sent by one user to one other user. This message when read for the first time will need to be updated stating it has been read so it will require one update in it's life if it has been read. A User must be able to view all their sent messages ordered by newest and all the messages they received ordered by newest.

My initial design looks like this:

Messages Table:

Message_ID (BIGINT) PK auto_increment
FromUser (INT 16)
ToUser (INT 16)
DateCreated (Timestamp)
MessageText (Varchar (500) )
HasRead (TINYINT (0=false, 1=true) )

The queries which will be run will include:

SELECT * FROM messages WHERE FromUser = '10000000' ORDER BY DateCreared DESC LIMIT [starting pagination value], 10 ; //get all users sent messages - newest first, get 10 at a time

SELECT * FROM messages WHERE ToUser= '10000000' ORDER BY DateCreared DESC LIMIT [starting pagination value], 10 ; //get all user's received messages - newest first, get 10 at a time

UPDATE messages SET HasRead = 1 WHERE Message_ID = '123456789'; // When a message has been read update it to show it has been read.

As a result I would have the following indexes:

-Message_ID (primary index) -(FromUser,DateCreated) BTREE DESC -(ToUser,DateCreated) BTREE DESC Additional "would like to have feature, but can live without it requirement if it effects performance considerably or is too complicated to implement" would be to have a user view their messages with another user:

SELECT * FROM messages WHERE (FromUser = '1000000 and ToUser = '2000000') OR (FromUser = '2000000' and ToUser = '1000000') ORDER BY DateCreated DESC [starting pagination value], 10 ; // Get all the messages sent between user 1000000 and user 2000000 - newest first, fetch 10 at a time. I was thinking of creating a new column which would be be a concatenation of [smaller_userid]-[larger_userid] and search on this field instead. If this was the case I would have an additional composite index on this new column + DateCreated. SELECT * FROM messages WHERE concateduser = '1000000-2000000' ORDER BY DateCreated DESC [starting pagination value], 10 ;

We believe this will work for, but we currently have an active community of users and have estimated when we roll out the feature it will be heavily used. Thus we would also like to plan for the future to be able to scale (premature now, but we think this is a very simple feature and are hoping to design it well now to save us time in the future).

So this is what I was planning on doing with MySQL. Could this be translated into Google Datastore in which it would be effective?

I looked into some solutions, but they would be quite expensive. My first thought was to make every message an entity. This entity then would have an index on the To, From, hash(user1 + user2) attributes. This would allow me to get to, from, and combination of user messages. The problem is the writes would be expensive and the updates if a message as read would be expensive. The query is also expensive, if I load 10 messages initially it will take 1+10 read units. I tried another solution of storing N messages inside of one entity. For each user I would need to store at which entity they are currently at which becomes expensive too because for each message I will need to write the same data twice (once for each user), updates have to happen to both sets, plus there is additional reads to get which current entity the users are at when inserting or reading messages. I am looking for a better solution to this simple requirement.

Upvotes: 1

Views: 151

Answers (1)

thoutbeckers
thoutbeckers

Reputation: 2668

It all depends on your definition of expensive.

Your one message one entity design of course seems most sensible.

With 4 writes (entity + indexes) sending a million messages would cost:

(0.6 / 100000) * 4 * 1000000 = 24 cents

Another 24 cents for marking them as read

If we assume every message will lead to at least one read query (for a group of 10) (0.6 / 100000) * 11 * 1000000 = 66 cents

If we assume every message takes about 150 characters then storing them per month will cost (1000000 * 150 / 1024000000) * 0.18 = about 2.6 cents for a month, but this is additive, so after one year you'll be paying 31.2 cents, etc

So what do we find out? Reading is relatively expensive, or rather querying is expensive. You'll be doing a lot of it. Also likely different scenarios you didn't take into account yet (e.g. what if you want to show the user how many unread messages they have?).

So you'll want to aggressively memcache these queries, but then you'll find out about eventual consistency. In short, with Datastore a query does not (always) include all results of the writes you did beforehand (only eventually). This makes caching hard, because it's hard to know when your query will be up to date.

You'll likely end up using Ancestors and Entity Groups to solve this, essentially your messages will be grouped around a root Entity (that represents the user). Since this provides strong consistency and atomicity you'll find that this provides a natural point to keep certain information. This user related entity can store things like the unread messages count, or even a list of the id's of all unread messages, the last 10 messages, etc in a single Entity.

This can then be aggressively memcached (upon write), as can the messages themselves. This means in an ideal situation you'll only write the message and the user entity, and you can show the unread message count, the first 10 messages and the message itself when a user clicks it without a single query or even read.

Ancestor queries come with a cost, not in cents but in the ability to do writes. This will be limited to (at worst) one write per second total for all the Entities in one group (so one user). That should not be a problem (unless you expect users to get messages more than every few seconds) but you still have to take into account writes will sometimes fail and have to be retried.

Choosing whether to use ancestors or not is an important decision, it's not impossible but very difficult to migrate to this model later. But I'd personally hold off on other aggressive performance optimizations until you see actual costs. Essentially performance will stay the same in this model whether you're doing a million messages per month or per second, and your money might be better spent on other things to increase revenue from users rather than trying to squeeze the last few cents out of Datastore.

Upvotes: 3

Related Questions