Diego de Oliveira
Diego de Oliveira

Reputation: 681

MySQL database Schema for two relational tables

I'm trying to think on the most performant database schema for a specific data structure. There are two main entities: Courses and Themes. A Course is a collection of Themes. A Theme have fields like Videos, Resources and Video Total Time.

Visually representing this data structure:

- Course
|_ ID: 12345
|_ Themes: [A, B] (an array of UIDs)

- Theme A
  |_ Courses: [12345,67890] (an array of UIDs)
  |_ Videos: [1,2,3,4,5,7] (an array of UIDs)
  |_ Resources: [10,11,12] (an array of UIDs)
  |_ Video Total Time: 10000 (probably stored as seconds as tinyint field)
- Theme B
  |_ Courses: [12345,98765] (an array of UIDs)
  |_ Videos: [5,6,7,8] (an array of UIDs)
  |_ Resources: [12,13,14] (an array of UIDs)
  |_ Video Total Time: 20000 (probably stored as seconds as tinyint field)

What I'm trying to achieve is a database schema for two tables, one for Courses, and one for Themes. The idea would be to have a MySQL query that gets a Course and group all fields from the Themes. In other words, when I get the result of the MySQL query, (using PHP) I'll get an array or object like this:

Array(
  'ID' => 12345
  'themes' => [A,B]
  'videos' => [1,2,3,4,5,6,7,8]
  'resources' => [10,11,12,13,14]
  'video_total_time' => 30000
)

So, the point is that they are two relational databases. When I send a query to the DB requesting data from the video, I need to pull data from all the themes, and merge them together.

Since I'm not an expert on SQL / MySQL, I'm trying to learn a little bit about it while I try to figure out:

1) What is the best database schema for these two entities? Courses and Themes? Thinking specially about performance

2) Can I get the final data all using SQL? Or should I pull some data from the database, and then parse the data with PHP? What is usually faster?

3) What is the best way to store the array of UIDs? As a string? Or there's a better way to store it?

The primary goal of this is performance. I have this kind of data in a different database schema, merged with thousands of other kinds of data (WP databases, wp_posts / wp_postmeta tables), but right now it's really slow to get the information I need.

Any tips and suggestions are more than welcome!


Edit: Solved!

It was a tough call to decide which answer suits best my needs, because @TimMorton's and @PaulSpiegel's answers lead us to the same path, but with slightly different approaches. Tim's answer is great to understand how to properly design database schemas, taking into account many-to-many relationships, and how to organize your queries. But since the main focus of this question is improve performance, Paul's answer is more focused on that, with specific details about primary keys and indexes (which are fundamental to improve performance of the queries).

Anyways, I learned a lot about designing a database schema. Here's the lessons I learned:

I don't know if I'm correct with everything above, but it's what I learned so far. Hope this helps someone else too.

Upvotes: 1

Views: 1229

Answers (3)

Tim Morton
Tim Morton

Reputation: 2644

In it's simplest form, assuming no many to many relationships:

Course                Theme
--------              --------
CourseID <--+         ThemeId
Name        |         Name
            +------   CourseID
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

In this form, a Course can have many themes, many videos, and many resources; but each theme, video, and resource can have only one course.

However, I don't think that's how you want it.

I would lean more towards

                      Course             Theme
                      --------           --------
            +---->    CourseId    +--->  ThemeId
            |         Name        |      Name
            |         ThemeId ----+      
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

This allows a course to have only one theme, but many videos and resources. This allows the themes to have more than one course.

But it still doesn't quite fit the bill...

This one allows many courses to share the same theme, as well as have more than one theme:

                      Course         Course_Theme      Theme
                      --------       ------------      --------
            +---->    CourseId <----- CourseId   +-->  ThemeId
            |         Name            ThemeId ---+     Name
            |         ThemeId       
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

As this stands now, each course can have many themes, videos, and resources. Each theme can have many courses. Each video and resource belongs to a course (i.e., can have only one course)

If a video or resource can be for more than one course, then you'll have to expand it just as I did with themes.


As per comment, everything is many to many. Notice I don't have any direct relations between themes and videos nor themes and resources. I don't think they will be necessary; you should be able to pick up what you need going through courses.

          Course         Course_Theme        Theme
          --------       ------------        --------
+---->    CourseId <---- CourseId                   
|         Name           ThemeId ----------> ThemeId
|                                            Name
|
|                        Course_Video        Video
|                        ------------        --------
+----------------------  CourseId                      
|                        VideoId ----------> VideoId
|                                            Name         
|                                            Length             
|                                                         
|                        Course_Resource     Resource
|                        ---------------     --------     
+----------------------- CourseId                        
                         ResourceId -------> ResourceId   
                                             Name         
                                             Url, etc.    

Now for the queries. Although it is possible to use aggregate functions along with group by, I think it makes far more sense to keep it simple and just pull things out one at a time.

Themes per course
SELECT T.* 
FROM COURSE C
INNER JOIN COURSE_THEME CT ON CT.COURSEID=C.COURSEID
INNER JOIN THEME T ON CT.THEMEID=T.THEMEID 
WHERE {insert your search conditions on course}

or, if you know CourseId:

SELECT T.*
FROM THEME T
INNER JOIN COURSE_THEME CT ON T.THEMEID = CT.THEMEID
WHERE CT.COURSEID = ?


likewise,

Videos per course
SELECT V.*
FROM COURSE C
INNER JOIN COURSE_VIDEO CV ON CV.COURSEID=CV.COURSEID
INNER JOIN VIDEO ON CV.VIDEOID=V.VIDEOID
WHERE {insert your search conditions on course}

or, if you know the CourseId:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?

to select the sum of the video lengths per course,

SELECT SUM(LENGTH) AS TOTAL
FROM VIDEO
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?
GROUP BY CV.COURSEID

Now, the tricky part is videos per theme.  I am making an assumption here:  the set of videos per theme is the same as the set of videos per course per theme.

The long way around:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE C ON COURSEID = CV.COURSEID
INNER JOIN COURSE_THEME CT ON C.COURSEID = CT.COURSEID
INNER JOIN THEME T ON CT.THEMEID = T.THEMEID
WHERE THEMEID = ?

Blech. You can cut out the middlemen:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE_THEME CT ON CV.COURSEID = CT.COURSEID
WHERE CT.THEMEID = ?

When you have your tables normalized, you can get any piece of information from whatever starting point you choose. FWIW, your example is a fairly complicated one since everything is many to many relations.


Update

Even though I had courses as the root, even when themes are the root things don't change much:

          Theme          Course_Theme        Course
          --------       ------------        --------
+---->    ThemeId <----  ThemeId                   
|         Name           CourseId ---------> CourseId
|                                            Name
|
|                        Theme_Video         Video
|                        ------------        --------
+----------------------  ThemeId                      
|                        VideoId --------->  VideoId
|                                            Name         
|                                            Length             
|                                                         
|                        Theme_Resource      Resource
|                        --------------      --------     
+----------------------- ThemeId                        
                         ResourceId ------>  ResourceId   
                                             Name         
                                             Url, etc.    

In this configuration, courses have videos and resources through ThemeId, i.e.:

SELECT V.*
FROM COURSE_THEME CT 
INNER JOIN VIDEO_THEME VT ON VT.THEMEID = CT.THEMEID
INNER JOIN VIDEO V ON V.VIDEOID = VT.VIDEOID
WHERE CT.THEMEID = ?

Upvotes: 1

Paul Spiegel
Paul Spiegel

Reputation: 31812

Creating the schema

Step 1: Identify entities and their attributes

  • Course (ID, title, description)
  • Theme (ID, title, description)
  • Video (ID, title, description, duratation)
  • Ressource (ID, title, url)

Step 2: Identify relations

  • Theme => Course
  • Video => Theme
  • Ressource => Theme

Step 3: Create tables

  • courses
    • ID (PK)
    • title
    • description
  • themes
    • ID (PK)
    • course_id (FK)
    • title
    • description
  • videos
    • ID (PK)
    • theme_id (FK)
    • title
    • description
    • duratation
  • ressources
    • ID (PK)
    • theme_id (FK)
    • title
    • url

If themes can share videos and ressources, then it would be many-to-many relations. In this case you would need separate tables for those relations. Remove the theme_id column from videos and ressources and add the following tables:

  • themes_videos
    • theme_id (PK) (FK)
    • video_id (PK) (FK)
  • themes_ressources
    • theme_id (PK) (FK)
    • ressource_id (PK) (FK)

Here you should define composite primary keys on (theme_id, video_id) and (theme_id, ressource_id). Also create reverse indexes on (video_id, theme_id) and (ressource_id, theme_id).

Retrieving data

Assuming you know the ID of the course (which is 123), you can then retrieve the related data (from the many-to-many schema) with the following queries (which you execute one by one):

select c.*
from courses c
where c.id = 123;

select t.*
from themes t
where t.course_id = 123;

select distinct v.*
from themes t
join themes_videos tv on tv.theme_id = t.id
join videos v on v.id = tv.video_id
where t.course_id = 123;

select distinct r.*
from themes t
join themes_ressources tr on tr.theme_id = t.id
join ressources r on r.id = tr.ressource_id
where t.course_id = 123;

Then compose your array/object from retrieved data in PHP.

Performance

Trying to get all data with a single SQL query is not always a good idea. You just make your code and schema too complicated. Executing a couple of queries is not the end of the world. What you should avoid, is running executing a query in a loop (like: for each theme select related videos).

Upvotes: 1

Kashem Ali
Kashem Ali

Reputation: 39

Table Structure

Make tables like image as shown and use json encode/decode time of input/out. In the query you can have total time from the table.

Upvotes: -1

Related Questions