better way store friends and retrieve friends of a particular user

Question

Which of these mysql storage methods would be faster to query (to retrieve friends of a particular user say @anyid):

Table: Friends

Columns: friend1 friend2

insert only one row per friendship and query as

select friend1 
from FRIENDS 
where friend2=@anyid
UNION ALL
select friend2 
from FRIENDS 
where friend1=@anyid

OR

insert two rows per friendship (insert as friend1 friend2 and then insert swapped entries eg, 1,2 and 2,1) and then simply query as

select friend2 from FRIENDS where friend1=@anyid

to me it seems the second method will perform better since it requires indexing on only a single column friend1 and a single query. The first method may require both columns to be indexed and requires running two sub queries. But in some posts i see people claiming the first method may perform better. On the plus side the first method will take half the storage when compared to the second method, would that make a huge difference for say million records.

Also one other thing, is it required to even store a seperate relationship_id and creation_date for any of the two methods above. what could be the exceptional benefit i would get other than giving me the relationship creation time or sequence, i think most people should be able to live without it or not?

Thanks for any answers!

Gordon Linoff · Accepted Answer

To me, the fundamental question is: "How many friendships are in the database?" If only A and B are friends, is the number "1" or "2". This answer dictates how you understand the friendship entity and hence how it should be implemented.

There is definitely a difference in the two approaches. To count the number of friends in the first way:

select count(*)
from friends
where USERID in (friend1, friend2);

versus:

select count(*)
from friends
where USERID = friend1;

The second can more easily take advantage of an index on friend1 and can be optimized to be faster than the first.

On the other hand, inserting a record takes twice as long. And finding friends of friends (and so on) requires handling more data and hence more processing time.

The friendship relationship does have additional characteristics, such as who initiated the request (this would often be friend1 in the first approach). Or the times the friendship was requested and accepted. Such additional information suggests the first approach to eliminate the duplication of data.

In other words, the question about the data structure -- as with almost all such questions -- is answered by how you understand the entities and how they are going to be used.

better way store friends and retrieve friends of a particular user

Answers (1)

Related Questions