Reputation: 1231
Which of these mysql storage methods would be faster to query (to retrieve friends of a particular user say @anyid):
Table: Friends
Columns: friend1 friend2
insert only one row per friendship and query as
select friend1
from FRIENDS
where friend2=@anyid
UNION ALL
select friend2
from FRIENDS
where friend1=@anyid
OR
insert two rows per friendship (insert as friend1 friend2 and then insert swapped entries eg, 1,2 and 2,1) and then simply query as
select friend2 from FRIENDS where friend1=@anyid
to me it seems the second method will perform better since it requires indexing on only a single column friend1 and a single query. The first method may require both columns to be indexed and requires running two sub queries. But in some posts i see people claiming the first method may perform better. On the plus side the first method will take half the storage when compared to the second method, would that make a huge difference for say million records.
Also one other thing, is it required to even store a seperate relationship_id
and creation_date
for any of the two methods above. what could be the exceptional benefit i would get other than giving me the relationship creation time or sequence, i think most people should be able to live without it or not?
Thanks for any answers!
Upvotes: 0
Views: 139
Reputation: 1270391
To me, the fundamental question is: "How many friendships are in the database?" If only A and B are friends, is the number "1" or "2". This answer dictates how you understand the friendship entity and hence how it should be implemented.
There is definitely a difference in the two approaches. To count the number of friends in the first way:
select count(*)
from friends
where USERID in (friend1, friend2);
versus:
select count(*)
from friends
where USERID = friend1;
The second can more easily take advantage of an index on friend1
and can be optimized to be faster than the first.
On the other hand, inserting a record takes twice as long. And finding friends of friends (and so on) requires handling more data and hence more processing time.
The friendship relationship does have additional characteristics, such as who initiated the request (this would often be friend1
in the first approach). Or the times the friendship was requested and accepted. Such additional information suggests the first approach to eliminate the duplication of data.
In other words, the question about the data structure -- as with almost all such questions -- is answered by how you understand the entities and how they are going to be used.
Upvotes: 3