Database: What schema is more efficient?

Question

Which of this approaches is the best? They are all normalized but this options confuses me.

First approach

This approach will have less empty data but will be more divided. It only save the player who try, kick or was expelled in each table.

Variant one

MatchPlayer(matchId, playerId)
Try(matchId, playerId, tries)
Kick(matchId, playerId, conversions, failedConversions, dropGoals)
Card(matchId, playerId, yellow, red)

Variant two

MatchPlayer(id, matchId, playerId)
Try(matchPlayerId, tries)
Kick(matchPlayerId, conversions, failedConversions, dropGoals)
Card(matchPlayerId, yellow, red)

Second approach

This approach will have empty data but will be less divided. It only save in MatchStats the players who score or were expelled.

Variant one

MatchPlayer(matchId, playerId)
MatchStats(matchId, playerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Variant two

MatchPlayer(id, matchId, playerId)
MatchStats(matchPlayerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Third approach

This approach will have much more empty data but will be much less divided. It save all the data as zeros even if the player didn't score or was expelled.

MatchPlayer(matchId, playerId, tries, conversions, failedConversions, dropGoals, yellow, red)

Fourth approach

This approach will have no empty data at all but will be much more divided.

Variant one

MatchPlayer(matchId, playerId)
Try(matchId, playerId, tries)
Conversion(matchId, playerId, conversions)
FailedConversion(matchId, playerId, failedConversions)
DropGoal(matchId, playerId, dropGoals)
YellowCard(matchId, playerId, yellow)
RedCard(matchId, playerId, red)

Variant two

MatchPlayer(matchPlayerId, matchId, playerId)
Try(matchPlayerId, tries)
Conversion(matchPlayerId, conversions)
FailedConversion(matchPlayerId, failedConversions)
DropGoal(matchPlayerId, dropGoals)
YellowCard(matchPlayerId, yellow)
RedCard(matchPlayerId, red)

What's your opinion?

Michael Petito · Accepted Answer

Third approach seems reasonable - it's a relationship table with attributes describing the relationship. You have a composite key for the relationship and your attributes have expected values any time the relationship exists.

In terms of efficiency, your table is not particularly wide, so the efficiency of scanning the table to compute aggregate statistics is not compromised. All of the approaches will have similar performance for queries such as:

List the stats for all players in a match
List the stats for a player's matches

These queries will rely on indexing of the relationship table in every case, with lookups into the joined tables. Given the number of rows you might expect in the results per match or per player, the joins are not significant. However, as noted the third approach eliminates the need for any joins while maintaining an appropriate level of normalization.

Database: What schema is more efficient?

First approach

Variant one

Variant two

Second approach

Variant one

Variant two

Third approach

Fourth approach

Variant one

Variant two

Answers (2)

Related Questions