Reputation: 681
I'm trying to think on the most performant database schema for a specific data structure. There are two main entities: Courses and Themes. A Course is a collection of Themes. A Theme have fields like Videos, Resources and Video Total Time.
Visually representing this data structure:
- Course
|_ ID: 12345
|_ Themes: [A, B] (an array of UIDs)
- Theme A
|_ Courses: [12345,67890] (an array of UIDs)
|_ Videos: [1,2,3,4,5,7] (an array of UIDs)
|_ Resources: [10,11,12] (an array of UIDs)
|_ Video Total Time: 10000 (probably stored as seconds as tinyint field)
- Theme B
|_ Courses: [12345,98765] (an array of UIDs)
|_ Videos: [5,6,7,8] (an array of UIDs)
|_ Resources: [12,13,14] (an array of UIDs)
|_ Video Total Time: 20000 (probably stored as seconds as tinyint field)
What I'm trying to achieve is a database schema for two tables, one for Courses, and one for Themes. The idea would be to have a MySQL query that gets a Course and group all fields from the Themes. In other words, when I get the result of the MySQL query, (using PHP) I'll get an array or object like this:
Array(
'ID' => 12345
'themes' => [A,B]
'videos' => [1,2,3,4,5,6,7,8]
'resources' => [10,11,12,13,14]
'video_total_time' => 30000
)
So, the point is that they are two relational databases. When I send a query to the DB requesting data from the video, I need to pull data from all the themes, and merge them together.
Since I'm not an expert on SQL / MySQL, I'm trying to learn a little bit about it while I try to figure out:
1) What is the best database schema for these two entities? Courses and Themes? Thinking specially about performance
2) Can I get the final data all using SQL? Or should I pull some data from the database, and then parse the data with PHP? What is usually faster?
3) What is the best way to store the array of UIDs? As a string? Or there's a better way to store it?
The primary goal of this is performance. I have this kind of data in a different database schema, merged with thousands of other kinds of data (WP databases, wp_posts / wp_postmeta tables), but right now it's really slow to get the information I need.
Any tips and suggestions are more than welcome!
It was a tough call to decide which answer suits best my needs, because @TimMorton's and @PaulSpiegel's answers lead us to the same path, but with slightly different approaches. Tim's answer is great to understand how to properly design database schemas, taking into account many-to-many relationships, and how to organize your queries. But since the main focus of this question is improve performance, Paul's answer is more focused on that, with specific details about primary keys and indexes (which are fundamental to improve performance of the queries).
Anyways, I learned a lot about designing a database schema. Here's the lessons I learned:
I don't know if I'm correct with everything above, but it's what I learned so far. Hope this helps someone else too.
Upvotes: 1
Views: 1229
Reputation: 2644
In it's simplest form, assuming no many to many relationships:
Course Theme
-------- --------
CourseID <--+ ThemeId
Name | Name
+------ CourseID
|
|
| Video
| --------
| VideoID
| Name
| Length
+------ CourseID
|
|
| Resource
| --------
| ResourceID
| Name
+------ CourseID
In this form, a Course can have many themes, many videos, and many resources; but each theme, video, and resource can have only one course.
However, I don't think that's how you want it.
I would lean more towards
Course Theme
-------- --------
+----> CourseId +---> ThemeId
| Name | Name
| ThemeId ----+
|
|
| Video
| --------
| VideoID
| Name
| Length
+------ CourseID
|
|
| Resource
| --------
| ResourceID
| Name
+------ CourseID
This allows a course to have only one theme, but many videos and resources. This allows the themes to have more than one course.
But it still doesn't quite fit the bill...
This one allows many courses to share the same theme, as well as have more than one theme:
Course Course_Theme Theme
-------- ------------ --------
+----> CourseId <----- CourseId +--> ThemeId
| Name ThemeId ---+ Name
| ThemeId
|
|
| Video
| --------
| VideoID
| Name
| Length
+------ CourseID
|
|
| Resource
| --------
| ResourceID
| Name
+------ CourseID
As this stands now, each course can have many themes, videos, and resources. Each theme can have many courses. Each video and resource belongs to a course (i.e., can have only one course)
If a video or resource can be for more than one course, then you'll have to expand it just as I did with themes.
As per comment, everything is many to many. Notice I don't have any direct relations between themes and videos nor themes and resources. I don't think they will be necessary; you should be able to pick up what you need going through courses.
Course Course_Theme Theme
-------- ------------ --------
+----> CourseId <---- CourseId
| Name ThemeId ----------> ThemeId
| Name
|
| Course_Video Video
| ------------ --------
+---------------------- CourseId
| VideoId ----------> VideoId
| Name
| Length
|
| Course_Resource Resource
| --------------- --------
+----------------------- CourseId
ResourceId -------> ResourceId
Name
Url, etc.
Now for the queries. Although it is possible to use aggregate functions along with group by, I think it makes far more sense to keep it simple and just pull things out one at a time.
Themes per course
SELECT T.*
FROM COURSE C
INNER JOIN COURSE_THEME CT ON CT.COURSEID=C.COURSEID
INNER JOIN THEME T ON CT.THEMEID=T.THEMEID
WHERE {insert your search conditions on course}
or, if you know CourseId:
SELECT T.*
FROM THEME T
INNER JOIN COURSE_THEME CT ON T.THEMEID = CT.THEMEID
WHERE CT.COURSEID = ?
likewise,
Videos per course
SELECT V.*
FROM COURSE C
INNER JOIN COURSE_VIDEO CV ON CV.COURSEID=CV.COURSEID
INNER JOIN VIDEO ON CV.VIDEOID=V.VIDEOID
WHERE {insert your search conditions on course}
or, if you know the CourseId:
SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?
to select the sum of the video lengths per course,
SELECT SUM(LENGTH) AS TOTAL
FROM VIDEO
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?
GROUP BY CV.COURSEID
Now, the tricky part is videos per theme. I am making an assumption here: the set of videos per theme is the same as the set of videos per course per theme.
The long way around:
SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE C ON COURSEID = CV.COURSEID
INNER JOIN COURSE_THEME CT ON C.COURSEID = CT.COURSEID
INNER JOIN THEME T ON CT.THEMEID = T.THEMEID
WHERE THEMEID = ?
Blech. You can cut out the middlemen:
SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE_THEME CT ON CV.COURSEID = CT.COURSEID
WHERE CT.THEMEID = ?
When you have your tables normalized, you can get any piece of information from whatever starting point you choose. FWIW, your example is a fairly complicated one since everything is many to many relations.
Update
Even though I had courses as the root, even when themes are the root things don't change much:
Theme Course_Theme Course
-------- ------------ --------
+----> ThemeId <---- ThemeId
| Name CourseId ---------> CourseId
| Name
|
| Theme_Video Video
| ------------ --------
+---------------------- ThemeId
| VideoId ---------> VideoId
| Name
| Length
|
| Theme_Resource Resource
| -------------- --------
+----------------------- ThemeId
ResourceId ------> ResourceId
Name
Url, etc.
In this configuration, courses have videos and resources through ThemeId
, i.e.:
SELECT V.*
FROM COURSE_THEME CT
INNER JOIN VIDEO_THEME VT ON VT.THEMEID = CT.THEMEID
INNER JOIN VIDEO V ON V.VIDEOID = VT.VIDEOID
WHERE CT.THEMEID = ?
Upvotes: 1
Reputation: 31812
If themes can share videos and ressources, then it would be many-to-many relations.
In this case you would need separate tables for those relations.
Remove the theme_id
column from videos
and ressources
and add the following tables:
Here you should define composite primary keys on (theme_id, video_id)
and (theme_id, ressource_id)
.
Also create reverse indexes on (video_id, theme_id)
and (ressource_id, theme_id)
.
Assuming you know the ID of the course (which is 123), you can then retrieve the related data (from the many-to-many schema) with the following queries (which you execute one by one):
select c.*
from courses c
where c.id = 123;
select t.*
from themes t
where t.course_id = 123;
select distinct v.*
from themes t
join themes_videos tv on tv.theme_id = t.id
join videos v on v.id = tv.video_id
where t.course_id = 123;
select distinct r.*
from themes t
join themes_ressources tr on tr.theme_id = t.id
join ressources r on r.id = tr.ressource_id
where t.course_id = 123;
Then compose your array/object from retrieved data in PHP.
Trying to get all data with a single SQL query is not always a good idea. You just make your code and schema too complicated. Executing a couple of queries is not the end of the world. What you should avoid, is running executing a query in a loop (like: for each theme select related videos).
Upvotes: 1
Reputation: 39
Make tables like image as shown and use json encode/decode time of input/out. In the query you can have total time from the table.
Upvotes: -1