Yannis
Yannis

Reputation: 6157

Model daily game ranking in DynamoDB

I have a question. I m pretty new to DynamoDB but have been working on large scale aggregation on SQL databases for a long time.

Suppose you have a table called GamePoints (PlayerId, GameId, Points) and would like to create a ranking table Rankings (PlayerId, Points) sorted by points.

This table needs to be updated on an hourly basis but keeping the previous version of its contents is not required. Just the current Rankings.

The query will always be give me the ranking table (with paging).

The GamePoints table will get very very large over time.

Questions:

Is this the best practice schema for DynamoDB ? How would you do this kind of aggregation?

Thanks

Upvotes: 2

Views: 2346

Answers (3)

cybermach
cybermach

Reputation: 125

An easy way to do this is by using DynamoDb's HashKey and Sort key. For example, the HashKey is the GameId and Sort key is the Score. You then query the table with a descending sort and a limit to get the real-time top players in O(1).

To get the rank of a given player, you can use the same technique as above: you get the top 1000 scores in O(1) and you then use BinarySearch to find the player's rank amongst the top 1000 scores in O(log n) on your application server.

If the user has a rank of 1000, you can specify that this user has a rank of 1000+. You can also obviously change 1000 to a greater number (100,000 for example).

Hope this helps.

Henri

Upvotes: 2

b-s-d
b-s-d

Reputation: 5153

The PutItem can be helpful to implement the persistence logic according to your Use Case:

PutItem Creates a new item, or replaces an old item with a new item. If an item that has the same primary key as the new item already exists in the specified table, the new item completely replaces the existing item. You can perform a conditional put operation (add a new item if one with the specified primary key doesn't exist), or replace an existing item if it has certain attribute values. Source: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html

In terms of querying the data, if you know for sure that you are going to be reading the entire Ranking table, I would suggest doing it through several read operations with minimum acceptable page size so you can make the best use of your provisioned throughput. See the guidelines below for more details:

Instead of using a large Scan operation, you can use the following techniques to minimize the impact of a scan on a table's provisioned throughput.

Reduce Page Size

Because a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size. The Scan operation provides a Limit parameter that you can use to set the page size for your request. Each Scan or Query request that has a smaller page size uses fewer read operations and creates a "pause" between each request. For example, if each item is 4 KB and you set the page size to 40 items, then a Query request would consume only 40 strongly consistent read operations or 20 eventually consistent read operations. A larger number of smaller Scan or Query operations would allow your other critical requests to succeed without throttling.

Isolate Scan Operations

DynamoDB is designed for easy scalability. As a result, an application can create tables for distinct purposes, possibly even duplicating content across several tables. You want to perform scans on a table that is not taking "mission-critical" traffic. Some applications handle this load by rotating traffic hourly between two tables – one for critical traffic, and one for bookkeeping. Other applications can do this by performing every write on two tables: a "mission-critical" table, and a "shadow" table.

SOURCE: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html#QueryAndScanGuidelines.BurstsOfActivity

You can also segment your tables by GameId (e.g. Ranking_GameId) to distribute the data more evenly and give you more granularity in terms of provisioned throughput.

Upvotes: 0

Alexander Patrikalakis
Alexander Patrikalakis

Reputation: 5195

You can enable a DynamoDB Stream on the GamePoints table. You can read stream records from the stream to maintain materialized views, including aggregations, like the Rankings table. Set StreamViewType=NEW_IMAGE on your GamePoints table, and set up a Lambda function to consume stream records from your stream and update the points per player using atomic counters (UpdateItem, HK=player_id, UpdateExpression="ADD Points #stream_record_points", ExpressionAttributeValues={"#stream_record_points":[put the value from stream record here.]}). As the hash key of the Rankings table would still be the player ID, you could do full table scans of the Rankings table every hour to get the n highest players, or all the players and sort.

However, considering the size of fields (player_id and number of points probably do not take more than 100 bytes), an in memory cache updated by a Lambda function could equally well be used to track the descending order list of players and their total number of points in real time. Finally, if your application requires stateful processing of Stream records, you could use the Kinesis Client Library combined with the DynamoDB Streams Kinesis Adapter on your application server to achieve the same effect as subscribing a Lambda function to the Stream of the GamePoints table.

Upvotes: 4

Related Questions