A simplified version of Twitter. Understanding many-to-many relationships between tables in the database

Question

I am reading this article about a Twitter-like application. The type of storage where tweets, users, likes, etc. will be stored is a relational database. The database scheme is described here and drawn here.

As an Android developer, I coded my sample using SQLite. This is how I would code it:

create table users (_id integer primary key, username text unique, first_name text, last_name text);
create table tweets (_id integer primary key, content text, created_at integer, user_id integer, foreign key(user_id) references users(_id));
create table connections (_id integer primary key, follower_id integer, followee_id integer, created_at integer, foreign key(follower_id) references users(_id), foreign key (followee_id) references users(_id));
create table favorites (_id integer primary key, user_id integer, tweet_id integer, foreign key (user_id) references users(_id), foreign key (tweet_id) references tweets(_id));

Now let's insert some data.

users:

insert into users values (1, 'user1', 'Lorem', 'Ipsum');
insert into users values (2, 'user2', 'Dolor', 'Sit');
insert into users values (3, 'user3', 'Foo', 'Bar');
insert into users values (4, 'user4', 'Qwerty', 'Trewq');

some tweets:

insert into tweets values(10, '1 Tweet from user1', 1100, 1);
insert into tweets values(11, '2 Tweet from user1', 1101, 1);
insert into tweets values(12, '3 Tweet from user1', 1102, 1);
insert into tweets values(13, '4 Tweet from user1', 1103, 1);

insert into tweets values(14, '1 Tweet from user2', 1103, 2);
insert into tweets values(15, '2 Tweet from user2', 1103, 2);

insert into tweets values(16, '1 Tweet from user3', 1103, 3);
insert into tweets values(17, '2 Tweet from user3', 1103, 3);

insert into tweets values(18, '1 Tweet from user4', 1107, 4);

favorites (the same as likes):

insert into favorites values(1, 2, 11);
insert into favorites values(2, 3, 13);
insert into favorites values(3, 4, 15);

There is a question about the database scheme:

Do you think you could support with our database design the ability to display a page for a given user with their latest tweets that were favorited at least once?

Yes, this is why query:

sqlite> select favorites._id, tweets._id as tweet_row_id, tweets.content from favorites join tweets on tweets.user_id=1 and tweets._id = favorites.tweet_id order by tweets._id desc limit 1;
_id         tweet_row_id  content
----------  ------------  ------------------
2           13            4 Tweet from user1

Explanation:

The left dataset is the table favorites. The right dataset is the table tweets. I join the two datasets. Then tweets.user_id=1 and tweets._id = favorites.tweet_id is evaluated for each row of the resulting dataset as a boolean expression. If the result is true, the row is included. order by tweets._id desc is used to get the latest tweets (the greater tweets._id is, the newer the tweet is). limit is used to limit the number of rows. If the user has been using our Twitter-like app for years, we'll show the latest 10 or 20 tweets.

My questions.

Is there anything wrong with my database scheme? I omitted not null, unique, and other column constraints for simplicity.
Here the author of the original article says:

The first relation is addressed by sticking the user ID to each tweet. This is possible because each tweet is created by exactly one user. It’s a bit more complicated when it comes to following users and favoriting tweets. The relationship there is many-to-many.

"The first relation" is users-tweets.

Why do we need a many-to-many here? In my scheme I only use a one-to-many.

Update 1

Gottfried Lesigang · Accepted Answer

Shortly I placed an answer here, where the OP - like you in this question - was unsure about 1:n and n:m.

I assume, that your final sentence is the actual question you have:

Why do we need a many-to-many here? In my scheme I only use a one-to-many

The relation user-tweets is 1:n...

Think in objects

user (id, name, ...)
tweet (id, author (FK on user), datetime, content, ...)

The like is an object with sepecific details on its own:

like (id, userid,tweetid,datetime,...)

For this you need a mapping table (you call it favourites)

There is a 1:n-relation from users to this mapping and a 1:n-relation from tweets to this mapping.
These two 1:n-relations form the m:n-relation together.

Now each tweet can be liked by many users and each user can like many tweets, but one user should (probably) not like the same tweet twice (unique key or even a two column PK?). And you might introduce a CHECK constraint to ensure, that the liking user and the author's userid is not the same (don't like your own tweets).

As a side note:

Is there anything wrong with my database scheme

You should never create constraints wihtout naming them

CREATE TABLE Dummy
(
 ID INT IDENTITY CONSTRAINT PK_Dummy PRIMARY KEY
,UserID INT NOT NULL CONSTRAINT FK_Dummy_UserID FOREIGN KEY REFERENCES User(id)
,...
)

If this database was ever installed on different systems, they'll get different (random) names and future upgrade scripts will get you in deepest pain...

UPDATE: example for the side note

In you comment you ask, what this last sentence is about... Try this

CREATE DATABASE testDB;
GO
USE testDB;
GO
CREATE TABLE testTbl1(ID INT IDENTITY PRIMARY KEY,SomeValue INT UNIQUE);
CREATE TABLE testTbl2(ID INT IDENTITY PRIMARY KEY,FKtoTbl1 INT NOT NULL FOREIGN KEY REFERENCES testTbl1(ID));
GO
CREATE TABLE testTbl3(ID INT IDENTITY CONSTRAINT PK_3 PRIMARY KEY,SomeValue INT CONSTRAINT UQ_3_SomeValue UNIQUE);
CREATE TABLE testTbl4(ID INT IDENTITY CONSTRAINT PK_4 PRIMARY KEY,FKtoTbl3 INT NOT NULL CONSTRAINT FK_4_FKtoTbl3 FOREIGN KEY REFERENCES testTbl3(ID));

GO
SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS;
GO
USE master;
GO
DROP DATABASE testDB;
GO

On column in your result looks like this:

CONSTRAINT_NAME
------------------------------
PK__testTbl1__3214EC27ABEA2C0C
UQ__testTbl1__0E5C381C04C8AF66
PK__testTbl2__3214EC272784631C
FK__testTbl2__FKtoTb__1367E606
PK_3
UQ_3_SomeValue
PK_4
FK_4_FKtoTbl3

If this script is run twice, the given names will stay as you defined them. The other names will get a random name like PK__testTbl1__3214EC27ABEA2C0C. Now imagine, you need to create an upgrade script for several installed systems where one constraint has to be dropped or modified. How would you do this, if you do not know its name?

A simplified version of Twitter. Understanding many-to-many relationships between tables in the database

Answers (1)

As a side note:

UPDATE: example for the side note

Related Questions