TWO COLUMN "OR" query speed versus SINGLE COLUMN "DOUBLE THE ROWS" query speed

Question

So let's say we have a table that contains several columns, and we index on TWO COLUMNS: Player 1 and Player 2:

GAME   PLAYER 1    PLAYER 2   ...
====   ========    ========  
1      John        Dan
2      Carl        William
3      Carl        John
...

An alternate table structure indexing only on a SINGLE COLUMN: Player 1 (and with double the rows):

GAME   PLAYER 1    PLAYER 2   ...
====   ========    ========  
1      John        Dan
1      Dan         John
2      Carl        William
2      William     Carl
3      Carl        John
3      John        Carl
...

I want to query for all the games John played in. Which table structure/approach is better?

Things to consider:

This is a simplified version, and one must take into account the tables will contain at least 100's of thousands of rows, and there will be several joins within the query.
There will be joins on the player columns, so utilizing the TWO COLUMN approach with an "OR"/"UNION" will also double some of the joins as well.
The SINGLE COLUMN approach will require a "GROUP BY" on the game column.
We don't want to run out of memory, queries should stay running in memory!
The TWO COLUMN approach is already getting slow, and now relies on Mysql's built-in cacheing to keep it running at a comfortable speed. I'm thinking of going to the SINGLE COLUMN approach, but unsure of the consequences.
Another solution is to have an additional table with Player and Game(foreign key).

David Garrison · Accepted Answer

In this simple case, with an index on both columns you are going to be essentially doubling the table size anyways with the Index.

If there are more than 2 player columns though, the second approach will have a number of rows equal to the number of games to the power of the number of player columns, while the index approach will just have the number of games multiplied by the number of player columns for index space.

Either way, to improve performance, it would be best to replace the names with IDs that reference the players' names in another table.

Also, it seems to me there might be a simpler approach.

GAME   PLAYER
====   ========
1      John
1      Dan
2      Carl
2      William
3      Carl
3      John

No need to duplicate the data. Here you can still retrieve all the players that were in game 1, and now you can more easily retrieve the games that each player is in. Here, you only need one index. This could however possibly complicate some other queries.

TWO COLUMN "OR" query speed versus SINGLE COLUMN "DOUBLE THE ROWS" query speed

Answers (2)

Related Questions

TWO COLUMN &quot;OR&quot; query speed versus SINGLE COLUMN &quot;DOUBLE THE ROWS&quot; query speed

Answers (2)

Related Questions

TWO COLUMN "OR" query speed versus SINGLE COLUMN "DOUBLE THE ROWS" query speed