J.J.
J.J.

Reputation: 355

Database: What schema is more efficient?

Which of this approaches is the best? They are all normalized but this options confuses me.


First approach

This approach will have less empty data but will be more divided. It only save the player who try, kick or was expelled in each table.

Variant one

MatchPlayer(matchId, playerId)
Try(matchId, playerId, tries)
Kick(matchId, playerId, conversions, failedConversions, dropGoals)
Card(matchId, playerId, yellow, red)

Variant two

MatchPlayer(id, matchId, playerId)
Try(matchPlayerId, tries)
Kick(matchPlayerId, conversions, failedConversions, dropGoals)
Card(matchPlayerId, yellow, red)

Second approach

This approach will have empty data but will be less divided. It only save in MatchStats the players who score or were expelled.

Variant one

MatchPlayer(matchId, playerId)
MatchStats(matchId, playerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Variant two

MatchPlayer(id, matchId, playerId)
MatchStats(matchPlayerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Third approach

This approach will have much more empty data but will be much less divided. It save all the data as zeros even if the player didn't score or was expelled.

MatchPlayer(matchId, playerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Fourth approach

This approach will have no empty data at all but will be much more divided.

Variant one

MatchPlayer(matchId, playerId)
Try(matchId, playerId, tries)
Conversion(matchId, playerId, conversions)
FailedConversion(matchId, playerId, failedConversions)
DropGoal(matchId, playerId, dropGoals)
YellowCard(matchId, playerId, yellow)
RedCard(matchId, playerId, red)

Variant two

MatchPlayer(matchPlayerId, matchId, playerId)
Try(matchPlayerId, tries)
Conversion(matchPlayerId, conversions)
FailedConversion(matchPlayerId, failedConversions)
DropGoal(matchPlayerId, dropGoals)
YellowCard(matchPlayerId, yellow)
RedCard(matchPlayerId, red)

What's your opinion?

Upvotes: 0

Views: 48

Answers (2)

Michael Petito
Michael Petito

Reputation: 13161

Third approach seems reasonable - it's a relationship table with attributes describing the relationship. You have a composite key for the relationship and your attributes have expected values any time the relationship exists.

In terms of efficiency, your table is not particularly wide, so the efficiency of scanning the table to compute aggregate statistics is not compromised. All of the approaches will have similar performance for queries such as:

  • List the stats for all players in a match
  • List the stats for a player's matches

These queries will rely on indexing of the relationship table in every case, with lookups into the joined tables. Given the number of rows you might expect in the results per match or per player, the joins are not significant. However, as noted the third approach eliminates the need for any joins while maintaining an appropriate level of normalization.

Upvotes: 1

Turophile
Turophile

Reputation: 3405

I would pick Second Approach, Variant 1. If you only add the data for each player after each match, then query it later, having all of the data in a single table will make most queries simpler (as well as the overall database structure).

While other approaches may be better design-wise, the multi-table joins will just be annoying.

However, I would include a row for every player in the match (with 0 values for tries, etc). This will ensure you don't have to treat them as a special case when compiling statistics. This will, of course, depend upon the actual queries and how you write them, so consider it as an option.

Note that this advice is based upon gut feel, and limited knowledge of your actual application.

Upvotes: 1

Related Questions