dzhu
dzhu

Reputation: 813

Database for NBA games

Question

I am wondering how to design a relational database for NBA games.

There are several similar questions in StackOverflow but no one gives a satisfying answer.

Star Schema database for NBA/Basketball statistics

Database for Basketball teams/games/scores per season

Database Design NBA

Apparently the game data is interesting for basketball fans and tons of websites provide these statistics.
Also, video games for basketball, football, ... should maintain similar data too.

Is there a standard/recommended way to design these databases?

Example

The solution that comes into my mind looks like:

Team and Game

Country (CountryCode pk, CountryName)

City (Country pk fk, CityCode pk, CityName)

Team (TeamCode pk, TeamName, City fk)

Person (Name pk, Birthday pk, Height, Weight, HomeTown fkCity)

Employment (Employee pk fkPerson, JoiningDate pk, TeamCode fk2, DepartureDate, Salary, JerseyNumber)

Game (City pk fk1, Date pk, HostTeam fk2, VisitingTeam fk3)

GamePlayer (Game pk fk, Player pk fkEmployee, Position)

Game events

Shooting (Game pk fk1, TimeOfGame pk, Player fk2, ShootingType, Missed)

Rebound (Game pk fk1, TimeOfGame pk, Player fk2, IsOffensive)

Block (Game pk fk1, TimeOfGame pk, Blocker fk2, Blocked fk3)

Steal (Game pk fk1, TimeOfGame pk, Stealer fk2, Stealed fk3)

Other game events...

Another way is to store only the stats (derived data) for each game.

GameStats (Game pk fk1, Player pk fk2, Minutes, FieldGoal, FieldGoalMissed, ThreePointMade, ThreePointMissed, FreeThrow, FreeThrowMissed, DefensiveRebounds, OffensiveRebounds, Blocks, ...)

Historical Data

One difficulty I met for this data model is that players can change teams, change jersey numbers so we need to store historical data.

The events (Shooting, Rebound, Timeout, Substitution, ...) in a game are historical data too.

Derived Data

Another difficulty I met is to decide whether to store derived data.

Since derived data causes Update Anomaly, we should avoid them. However, I find it difficult to avoid derived data completely.

For example

  1. The game statistics are derived data. In order to avoid them, we have to store the events.
  2. The minutes a player played for a game is derived from the events (Substitution/Timeout/...).
  3. Some statistics are related, Turnover is derivable from Steal/Block/...
  4. For a game event, the game time can be derived from the time of day and other events (Substitution/Timeout/...).

Shall we make a compromise and store derived data for simplicity?
Does a truly normalized database contain derived data?


Edit 1 - Neutral Stadium

To account for neutral stadiums where no team is the home team, we can use the following design.

Stadium (City pk, Name pk, Capacity)

Game (Stadium pk fk, Date pk)

TeamRole (TeamRole pk) ['Home', 'Away', 'Neutral']

GameTeam (Game pk fk, Team pk fk, TeamRole pk fk)

Upvotes: 2

Views: 6366

Answers (1)

Zohar Peled
Zohar Peled

Reputation: 82474

It's impossible to give a complete answer without knowing all the details, but I can give you some pointers that will make your life easier.

In your description, tables Country, City and Team seems fine, but I don't understand why you need the City fk in the Person's table (unless it's city of birth, but then the column name should make that clear).

Employment

Now, about the Employment table - My suggestion is a little different:
First, you need to decide if you want to keep only the players or all game related stuff members of the team (such as team coach) in that table.
I would suggest the second option, meaning that you need to keep another table for job types (i.e player, coach, assistance coach etc.) and have an fk to it in the employment table.

Historical data

As for the historical data problem - there are some rules you want to take into consideration: for example, if your player is currently employed in a team, he probably can't be employed in another team at the same time, However, he can be a part of his countries national team while employed in a regular team. To cope with that, you will have to create some business logic rules in the forms of check constraints and perhaps some triggers to keep data integrity.
Coping with the teams change is quite easy, since you have the player's joining date and departure date for the team - all you need is to make sure that there is no overlapping between employments for any player in regular teams (remember the national team is an exception).

Changes while the player is employed in a team

To cope with the changes in jersey number, or whatever data that might be changed while the player is employed by the team, I suggest adding a table for employment details, that will be connected to the employment table, have a start date and end date and keep all the data that might change while the player is employed in a team. Again, you will need to make sure records do not overlap (date wise) for each employment id.

Active players

The active players table should be calculated per game, so it should have the game id as an fk. I suggest a simply table with gameId, teamId and playerId, while the primary key is gameId+PlayerId (I've left the teamId out of the primary key since it's another layer of protection in case the employment data is messed up and a player is considered employed in both teams on the date of the game)

Games

The Game table should have a primary key of game date + home team + away team. Note that games might be held on neutral stadiums, so you would want to test for games where the home team and away team are flipped before inserting new game data in your table. Also, you might want to keep a stadium table that will have an fk to the city and keep the stadium id in the games table instead of the city id.

Game statistics

You should strive to keep a single table to describe all events in the game. This table should of course have an fk to the games table and an fk to the teams table. From your question I understand that the statistics is done on the player level, so you also want to keep an fk to the players table.
Add a table for statistics details description and an fk to that table in the games statistics table. Basically, it should look like this:

StatisticsDetails (Detail_Id, Detail_Name)
and should hold data such as shoot, pass, rebound etc.

In you game statistics table you might need to log events that involves several players. there are a few ways to do it, I suggest to keep it as simple as possible: keep PlayerID1, PlayerId2, PlayerId3 in the table, and have all of them nullable except the first one.

Upvotes: 4

Related Questions