Cassandra Data Modeling Design

Question

I am fairly new to Cassandra and have been reading a lot over the last month.
I am working on a small use case.
Query: Top X Players based on AmountPlayed in a TimeRange.

So at any given time range i would want to aggreagate players TotalAmountPlayed and derive Top X Players.

I followed the approach of creating UDF (using C*-2.2.0 version) for aggregation of AmountPlayed by a Player.

Below is my Time Series data model i designed for this use case.

CREATE COLUMNFAMILY PlayerRating
(
PlayerNumber int, ===> Unique account number
GameID text, ===> unique machine ID per slot
AmountPlayed double, ===> AmountPlayed by the player
EventTime timestamp, ===> Event generated TimeStamp
PRIMARY KEY ((PlayerNumber, GameID),EventTime)) WITH CLUSTERING ORDER BY(EventTime desc);

Please let me know if my Data Model design is proper for My Query.

Thank You !!

Jim Meyer · Accepted Answer

I think it might be easier to put all the players for each game into a single partition.

That way you could aggregate all the players with one query instead of making a separate query for each player. Then you could aggregate the playing time for each player into a map (see an example of how to define the UDF's for that here).

So your table would look something like this:

CREATE TABLE playing_time_by_game (game_id text, event_time int, player_id text, amount_played int, PRIMARY KEY (game_id, event_time));

Then create the UDF to total by player_id:

CREATE FUNCTION state_group_and_total( state map, type text, amount int )
     CALLED ON NULL INPUT
     RETURNS map
     LANGUAGE java AS '
     Integer count = (Integer) state.get(type);  if (count == null) count = amount; else count = count + amount; state.put(type, count); return state; ' ;

Then create the aggregate function:

CREATE OR REPLACE AGGREGATE group_and_total(text, int) 
     SFUNC state_group_and_total 
     STYPE map 
     INITCOND {};

Then insert some data:

SELECT * from playing_time_by_game ;

 game_id | event_time | amount_played | player_id
---------+------------+---------------+-----------
   game1 |          0 |             8 |   player1
   game1 |          1 |            12 |   player2
   game1 |          5 |             1 |   player2
   game1 |          8 |            50 |   player1
   game2 |          0 |           200 |   player1

Now you can aggregate by player_id:

SELECT group_and_total(player_id, amount_played) from playing_time_by_game;

 t2.group_and_total(player_id, amount_played)
----------------------------------------------
              {'player1': 258, 'player2': 13}

And you can restrict the query to the game partition and time range:

SELECT group_and_total(player_id, amount_played) from playing_time_by_game where game_id='game1' and event_time >=0 and event_time <=7;

 t2.group_and_total(player_id, amount_played)
----------------------------------------------
                {'player1': 8, 'player2': 13}

You could probably also define a FINALFUNC to sort and keep just the top ten items in the map. See this.

Cassandra Data Modeling Design

Answers (1)

Related Questions