Reputation: 581
This relates to a post I made here
where initially I was interested in WordPress taxonomies
, but the more I think about my case, the more I think a custom table
approach makes sense.
So I am curious if there are any database experts who can advise me on table structure given my scenario:
I am setting up my site so I can teach guitar courses
. Perhaps of importance, I will not have many courses. Maybe 4-8.
Under a given course
, I have
Course 1
Lesson 1.1
Topic 1.1.1
Quiz 1.1.1.1
Quiz 1.1.1.2
Forum Topic 1.1.1.1
Topic 1.1.2
Quiz 1.1.2.1
Exercise 1.1.2.1
etc
Course 2
Lesson 2.1
Topic 2.1.1
Quiz 2.1.1.1
Forum Topic 2.1.1.1
Topic 2.1.2
etc
etc
I also use a course forum (BBPress
, which has its own taxonomy system), and at times need to tie individual forum topics to lesson topics.
The types of queries I need to do are
You probably get the idea.
Here is example dataset:
CREATE TABLE `posts` (
`ID` bigint(20) UNSIGNED NOT NULL,
`post_title` text COLLATE utf8mb4_unicode_520_ci NOT NULL,
`post_parent` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
`post_type` varchar(20) COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'post'
) ;
INSERT INTO posts
(ID, post_type, post_title, post_parent)
VALUES
(1, 'course', 'Course 1', 0),
(2, 'lesson', 'Lesson 1.1', 0),
(3, 'course-topic', 'Topic 1.1.1', 0),
(4, 'quiz', 'Quiz 1.1.1.1', 0),
(5, 'quiz', 'Quiz 1.1.1.2', 0),
(6, 'quiz', 'Quiz 1.1.1.3', 0),
(7, 'course-topic', 'Assignment 1.1.1.1', 0),
(8, 'forum', 'Course 1 Forum', 0),
(9, 'forum', 'Course 1 Discussions', 8),
(10, 'topic', 'Discussion 1.1.1.1', 9),
(11, 'course-topic', 'Topic 1.1.2', 0),
(12, 'quiz', 'Quiz 1.1.2.1', 0),
(13, 'lesson', 'Lesson 1.2', 0),
(14, 'course-topic', 'Topic 1.2.1', 0),
(15, 'topic', 'Discussion 1.2.1.1', 9);
Initially I set up taxonomy tables in WordPress and made a sqlfiddle with some example data and some example queries.
Here is the fiddle:
http://sqlfiddle.com/#!9/ffa734/3
A query just to get lessons for a given course ID involves 4 tables and looks like
-- get all lessons for a given course
SELECT course_rels.object_id as course_id, lesson.post_title as lesson_title FROM posts AS lesson
INNER JOIN wp_term_relationships AS lesson_rels ON lesson.ID = lesson_rels.object_id
INNER JOIN wp_term_relationships AS course_rels ON course_rels.object_id = 1
INNER JOIN wp_term_taxonomy AS lesson_tax
ON
(lesson_tax.term_taxonomy_id = lesson_rels.term_taxonomy_id
AND
lesson_tax.parent = course_rels.term_taxonomy_id)
INNER JOIN wp_terms AS terms ON terms.term_id = lesson_tax.term_id
WHERE terms.name = 'Lesson';
After I realized how awkward the queries become using taxonomies, I decided to make a custom table which given post data above, would look like
CREATE TABLE `relations` (
`post_id` bigint(20) UNSIGNED NOT NULL,
`related_course` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
`related_lesson` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
`related_topic` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
`post_type` varchar(16) COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT ''
);
INSERT INTO relations
(post_id, related_course, related_lesson, related_topic, post_type)
VALUES
(1, 0, 0, 0, 'course'),
(2, 1, 0, 0, 'lesson'),
(3, 1, 2, 0, 'lesson-topic'),
(4, 1, 2, 3, 'quiz'),
(5, 1, 2, 3, 'quiz'),
(6, 1, 2, 3, 'quiz'),
(7, 1, 2, 0, 'lesson-topic'),
(8, 1, 0, 0, 'forum'),
(9, 1, 0, 0, 'forum'),
(10, 1, 2, 3, 'topic'),
(11, 1, 2, 0, 'lesson-topic'),
(12, 1, 2, 11, 'quiz'),
(13, 1, 0, 0, 'lesson'),
(14, 1, 13, 0, 'lesson-topic'),
(15, 1, 13, 14, 'topic');
So you see in the fiddle examples of taxonomy queries, and example of a query I can do with a custom table.
The thing is, I have no expertise whatsoever in databases. I have no idea if my custom table approach is just a disaster waiting to happen. I don't want to reinvent anything - I am just tired of WordPress's approach of lumping data together and forcing me to query among tons of rows that are of no importance to my query of interest.
Given how often people have to solve this sort of problem, is there a tried and true, efficient, scalable approach to let me extract the sort of relational data I mention above?
I will add to the fiddle any other proposed solutions that come about here. I guess ideally I would have to insert thousands of rows into all tables to really see how queries perform, but maybe some expert immediately sees an optimal approach?
The biggest pitfall I see with my custom table is that it gives no flexibility given the column names I have chosen. If that is really a terrible approach, then maybe I just go with WordPress's taxonomies...
Upvotes: 1
Views: 208
Reputation: 33708
These obstacles are intellectual; matters of understanding, not matters of implementation.
Noting the Relational Database
tag.
Yes, a Relational database will give you:
(a) the structure you are seeking, in straight-forward logic, as well as:
(b) the easiest code and the fastest queries
(that which you have requested, plus any that you dream up in the future).
.
But you have to use Relational concepts (the Relational Model by Dr E F Codd).
If you use the concepts marketed by Date; Darwen; et al, which is marketed as "relational", both the structure [a], and the code for navigation and queries [b] will be horrendous. Additionally, you will not have Relational Integrity; or Relational Power; or Relational Speed that is possible in database that complies with Codd's Relational Model.
The Relational Model is Hierarchic.
Three types of Data Hierarchies are formally (by prior intent and design) in the Relational Model.
Your requirement is the simplest, four tables in a simple data hierarchy, which is reflected in the Relational Key (more, later).
The second type is a Tree (Single-Parent). You have that in Post, thus it needs no further explanation..
The third type is a Tree for Ancestors and a Tree for Descendants, without duplicating any data. Usually called a Bill of Materials structure. We don't have to worry about that here, it is beyond the scope of the question.
Warning. The links given in the comments will take you down a complex garden path, that might possibly be relevant for the other two types of hierarchies, and in any case, very poor implementations thereof.
Eg. SQL has recursion; MySQL does not have recursion, so a hard-coded and limited method is given).
Eg. some of the methods implement the trees in concrete: if a branch is moved, the entire tree needs to be re-written.
The point is, you don't need it. (You might need it for Post, but not for the answer to your question.)
You have an excellent grasp of your data, which is required for the exercise. But you have fixed ways of thinking about that data, no doubt due to the other implementation methods that you have tried, and the requirements therein. So the demand here is to give those methods of perceiving the data up, and to follow the Relational requirement:
To differentiate
Date; Darwen; et al, take the Result Set (the output of a query on Relational tables) as the starting point, as the perspective to be used when analysing and modelling the data. Which is reversing the order of nature, and guaranteed to cripple the modelling exercise (the report you want for a particular purpose vs the data as data).Then, as if stuck in an Excel spreadsheet mindset, they add on a
Record ID
field to the Result Set. Such an act cripples the modelling exercise further, because one now the false notion that a physicalRecord ID
identifies the logical row. It does not.
If you can give that fixed perspective up, along with the additions that contaminate the problem, and cross over the line into the Relational paradigm:
You will not need the relations
table: it is totally redundant, because the Relational database supplies all relationships (that you have defined thus far).
You won't need the IDs
, which are always one additional field and one additional index on every file. And the horrendous consequential problems.
Additionally, you will obtain:
JOINs
of distal tables will not require intermediate tables to be JOINed
, etc); and If interested, some of my recent Answers contain details and examples of those benefits.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or Relational data modelling.
Rather than the example data you have given at the top of your question (which is not really example data, but a simple definition of the Form that you would like the display of the data to take), examine the example data (which are, of course, Identifying Keys) that I have given (blue).
The Relational Model demands that the Keys are "made up from the data". The numbers are not.
The data hierarchy is both:
visual in the data model
and reflected in the Relational Keys.
Relational Keys
Course, Lesson, Topic, Quiz
are Key elements, short names, not necessarily codes (they would be codes in a large university), that are unique in the given context. The Title
would be the long name that shows up on screens for external users (prospects).
A Lesson
does not exist independently, it exists only in the context of a Course
. Therefore the Key for Lesson
is ( Course, Lesson
). An ID
does not uniquely identify a Lesson
(or anything else, for that matter). You might have more than one Lesson
across the Courses
.
A Topic
does not exist independently, it exists only in the context of a Course
and a Lesson
. Therefore the Key for Topic
is ( Course, Lesson, Topic
). Etc.
A Quiz
does not exist independently, it exists only in the context of a Course
, a Lesson
, and a Topic
. Therefore the Key for Quiz
is ( Course, Lesson, Topic, Quiz
). Etc.
Relational Integrity
Eg. A Quiz
is constrained to a particular Topic
, which is Identified by (Course, Lesson, Topic
).
Whereas a Record ID
based filing system (falsely marketed as "relational") cannot provide that, it can only constrain a Quiz
to any Topic
.
Of course, that applies to all the tables in the data hierarchy.
The relationship between Posts and the structure that is the answer to the question, is not at all clear, therefore it would be incorrect to model it. Nevertheless, it is important to you, so I have modelled it on a possibility basis, with reticence. If you define with more clarity, I will update the data model.
Dead simple, compared to that which you have in the fiddle, or that which is demanded by a Record ID
based filing system.
Get me all lessons for a course
SELECT Course,
Lesson,
Title -- Lesson Title
FROM Lesson
WHERE Course = @Course -- "Jazz"
Get me all quizzes for a course (which means all quizzes across all topics in the course)
SELECT Course,
Lesson,
Topic,
Quiz,
Title -- Quiz Title
FROM Quiz
WHERE Course = @Course -- "Jazz"
Get me all quizzes for a lesson (which means quizzes under topics for the lesson)
SELECT Course,
Lesson,
Topic,
Quiz,
Title -- Quiz Title
FROM Quiz
WHERE Course = @Course -- "Classical"
AND Lesson = @Lesson -- "Chords"
Let me demonstrate a bit of Relational Power, here the aspect is JOIN
power, eliminating intermediate tables (which are forced in Record ID
based filing systems).
Gimme all the Courses in which Licks are taught (meaning, in the
Title
of aTopic
)
SELECT DISTINCT Course.Title,
Topic.Topic,
Topic.Title
FROM Topic
JOIN Course ON Course.Course = Topic.Course
WHERE Title LIKE "%Lick%"
If Course
(short name) were enough to Identify the course, such that Course.Title
were not necessary in the result set, the Course
table can be eliminated from the query.
SELECT DISTINCT Course,
Topic,
Title
FROM Topic
WHERE Title LIKE "%Lick%"
Please feel free to ask questions, or to supply clarifications. If short, in the comments, otherwise update (edit) your question.
The thing that I am still not sure about is order of joins. With your courses that have some text in topic example, I would have thought that the JOIN is on topics. Maybe the order doesn't matter in case of INNER JOIN?
Minor coding error, Topic
Course
. Fixed.
Order of JOINs
Therefore the JOIN order in the code should be whatever is logical (refer to my code examples in other recent Answers).
If, not, then sure, whatever helps the mickey mouse freeware perform better. In that case, the Join order should be from the smallest populated table to the largest.
In a Relational database, almost all JOINs are INNER JOINs. OUTER JOINs are required only to capture optional rows (that may not exist) and to show that non-existence in the result set.
what do you think of taxonomies (like in WordPress) as a way of relating data? Just trying to figure out tradeoff between using what WordPress already has to offer and me doing pure custom approach.
Taxonomy in WP is pig poop. Same as the several methods that set up fragments of hierarchies, or build hierarchies in concrete, as given in the other link.
It is probably adequate, in that it provides the simple relation (not Relational) capabilities that most WP users would need. It is hopeless when you push that boundary. As you have found out.
You cannot do it in WP, full stop. Where you can do it, it is ordinary, not "custom".
Basically, the issue that is being raised, although not named, is System Architecture. For computer systems, the definition of Architecture is:
The appropriate composition [decomposition; segmentation; Normalisation] of code and data objects, and the deployment of said objects in appropriate [platform] locations.
Use the right tool for the job translates to: deploy the object in the right place.
The right place for all the data for the app is, in a single database. That would make life easy for all maintenance. You do not want bits here, and other bits elsewhere.
Remember, the database is a single recovery unit. Therefore, you need all rules; constraints; views; etc, and importantly all Transactions (the database API), in the database, and nowhere else.
Using a reliable SQL platform means security for the future. Once you have the database working correctly, you will never have to change it (expansion is a separate issue). It is isolated from the app, which is appropriate, and thus isolated from anything that changes in the app environment.
The corollary is:
you can mess with the WP taxonomy, which you know already is problematic; pushing its boundary, and therefore you know that if you ever have to tweak it, it will be problematic; and pushing its boundary.
And if WP ever change their taxonomy library to fix the problems, or to enhance the capability, you will have to change data objects; relationships; etc.
From where I sit, it is not a trade-off, it is an Architectural decision.
Upvotes: 2