Brian
Brian

Reputation: 581

Defining database relations among various content types for small website - advice on structure?

This relates to a post I made here

https://wordpress.stackexchange.com/questions/353988/setting-up-taxonomy-to-relate-various-custom-post-types-advice-on-structuring

where initially I was interested in WordPress taxonomies, but the more I think about my case, the more I think a custom table approach makes sense.

So I am curious if there are any database experts who can advise me on table structure given my scenario:

I am setting up my site so I can teach guitar courses. Perhaps of importance, I will not have many courses. Maybe 4-8.

Under a given course, I have

Course 1
   Lesson 1.1
      Topic 1.1.1
         Quiz 1.1.1.1
         Quiz 1.1.1.2
         Forum Topic 1.1.1.1
      Topic 1.1.2
         Quiz 1.1.2.1
         Exercise 1.1.2.1
      etc

Course 2
   Lesson 2.1
      Topic 2.1.1
         Quiz 2.1.1.1
         Forum Topic 2.1.1.1
      Topic 2.1.2
         etc
      etc

I also use a course forum (BBPress, which has its own taxonomy system), and at times need to tie individual forum topics to lesson topics.

The types of queries I need to do are

You probably get the idea.

Here is example dataset:

CREATE TABLE `posts` (
  `ID` bigint(20) UNSIGNED NOT NULL,
  `post_title` text COLLATE utf8mb4_unicode_520_ci NOT NULL,
  `post_parent` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
  `post_type` varchar(20) COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'post'
) ;

INSERT INTO posts
    (ID, post_type, post_title, post_parent)
VALUES
    (1, 'course', 'Course 1', 0),
    (2, 'lesson', 'Lesson 1.1', 0),
    (3, 'course-topic', 'Topic 1.1.1', 0),
    (4, 'quiz', 'Quiz 1.1.1.1', 0),
    (5, 'quiz', 'Quiz 1.1.1.2', 0),
    (6, 'quiz', 'Quiz 1.1.1.3', 0),
    (7, 'course-topic', 'Assignment 1.1.1.1', 0),
    (8, 'forum', 'Course 1 Forum', 0),
    (9, 'forum', 'Course 1 Discussions', 8),
    (10, 'topic', 'Discussion 1.1.1.1', 9),
    (11, 'course-topic', 'Topic 1.1.2', 0),
    (12, 'quiz', 'Quiz 1.1.2.1', 0),
    (13, 'lesson', 'Lesson 1.2', 0),
    (14, 'course-topic', 'Topic 1.2.1', 0),
    (15, 'topic', 'Discussion 1.2.1.1', 9);

Initially I set up taxonomy tables in WordPress and made a sqlfiddle with some example data and some example queries.

Here is the fiddle:

http://sqlfiddle.com/#!9/ffa734/3

A query just to get lessons for a given course ID involves 4 tables and looks like

-- get all lessons for a given course
SELECT course_rels.object_id as course_id, lesson.post_title as lesson_title FROM posts AS lesson
INNER JOIN wp_term_relationships AS lesson_rels ON lesson.ID = lesson_rels.object_id
INNER JOIN wp_term_relationships AS course_rels ON course_rels.object_id = 1
INNER JOIN wp_term_taxonomy AS lesson_tax 
   ON 
       (lesson_tax.term_taxonomy_id = lesson_rels.term_taxonomy_id
       AND
       lesson_tax.parent = course_rels.term_taxonomy_id)
INNER JOIN wp_terms AS terms ON terms.term_id = lesson_tax.term_id
WHERE terms.name = 'Lesson';

After I realized how awkward the queries become using taxonomies, I decided to make a custom table which given post data above, would look like

CREATE TABLE `relations` (
  `post_id` bigint(20) UNSIGNED NOT NULL,
  `related_course` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
  `related_lesson` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
  `related_topic` bigint(20) UNSIGNED NOT NULL DEFAULT '0',
  `post_type` varchar(16) COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT ''
);
INSERT INTO relations
    (post_id, related_course, related_lesson, related_topic, post_type)
VALUES
    (1, 0, 0, 0, 'course'),  
    (2, 1, 0, 0, 'lesson'), 
    (3, 1, 2, 0, 'lesson-topic'), 
    (4, 1, 2, 3, 'quiz'),
    (5, 1, 2, 3, 'quiz'),
    (6, 1, 2, 3, 'quiz'),
    (7, 1, 2, 0, 'lesson-topic'),
    (8, 1, 0, 0, 'forum'),
    (9, 1, 0, 0, 'forum'),
    (10, 1, 2, 3, 'topic'),
    (11, 1, 2, 0, 'lesson-topic'),
    (12, 1, 2, 11, 'quiz'),
    (13, 1, 0, 0, 'lesson'),
    (14, 1, 13, 0, 'lesson-topic'),
    (15, 1, 13, 14, 'topic');

So you see in the fiddle examples of taxonomy queries, and example of a query I can do with a custom table.

The thing is, I have no expertise whatsoever in databases. I have no idea if my custom table approach is just a disaster waiting to happen. I don't want to reinvent anything - I am just tired of WordPress's approach of lumping data together and forcing me to query among tons of rows that are of no importance to my query of interest.

Given how often people have to solve this sort of problem, is there a tried and true, efficient, scalable approach to let me extract the sort of relational data I mention above?

I will add to the fiddle any other proposed solutions that come about here. I guess ideally I would have to insert thousands of rows into all tables to really see how queries perform, but maybe some expert immediately sees an optimal approach?

The biggest pitfall I see with my custom table is that it gives no flexibility given the column names I have chosen. If that is really a terrible approach, then maybe I just go with WordPress's taxonomies...

Upvotes: 1

Views: 208

Answers (1)

PerformanceDBA
PerformanceDBA

Reputation: 33708

Obstacle

These obstacles are intellectual; matters of understanding, not matters of implementation.

  1. Noting the Relational Database tag.
    Yes, a Relational database will give you:

    • (a) the structure you are seeking, in straight-forward logic, as well as:

    • (b) the easiest code and the fastest queries
      (that which you have requested, plus any that you dream up in the future).
      .

    But you have to use Relational concepts (the Relational Model by Dr E F Codd).

    If you use the concepts marketed by Date; Darwen; et al, which is marketed as "relational", both the structure [a], and the code for navigation and queries [b] will be horrendous. Additionally, you will not have Relational Integrity; or Relational Power; or Relational Speed that is possible in database that complies with Codd's Relational Model.

  2. The Relational Model is Hierarchic.
    Three types of Data Hierarchies are formally (by prior intent and design) in the Relational Model.

    1. Your requirement is the simplest, four tables in a simple data hierarchy, which is reflected in the Relational Key (more, later).

    2. The second type is a Tree (Single-Parent). You have that in Post, thus it needs no further explanation..

    3. The third type is a Tree for Ancestors and a Tree for Descendants, without duplicating any data. Usually called a Bill of Materials structure. We don't have to worry about that here, it is beyond the scope of the question.

    Warning. The links given in the comments will take you down a complex garden path, that might possibly be relevant for the other two types of hierarchies, and in any case, very poor implementations thereof.

    • Eg. SQL has recursion; MySQL does not have recursion, so a hard-coded and limited method is given).

    • Eg. some of the methods implement the trees in concrete: if a branch is moved, the entire tree needs to be re-written.

    The point is, you don't need it. (You might need it for Post, but not for the answer to your question.)

  3. You have an excellent grasp of your data, which is required for the exercise. But you have fixed ways of thinking about that data, no doubt due to the other implementation methods that you have tried, and the requirements therein. So the demand here is to give those methods of perceiving the data up, and to follow the Relational requirement:

    • perceive the data, as data, and nothing but data
      No additional columns or structures

    To differentiate
    Date; Darwen; et al, take the Result Set (the output of a query on Relational tables) as the starting point, as the perspective to be used when analysing and modelling the data. Which is reversing the order of nature, and guaranteed to cripple the modelling exercise (the report you want for a particular purpose vs the data as data).

    Then, as if stuck in an Excel spreadsheet mindset, they add on a Record ID field to the Result Set. Such an act cripples the modelling exercise further, because one now the false notion that a physical Record ID identifies the logical row. It does not.

Relational Data Model

If you can give that fixed perspective up, along with the additions that contaminate the problem, and cross over the line into the Relational paradigm:

  • You will not need the relations table: it is totally redundant, because the Relational database supplies all relationships (that you have defined thus far).

  • You won't need the IDs, which are always one additional field and one additional index on every file. And the horrendous consequential problems.

Additionally, you will obtain:

  • Relational Integrity (which is logical, as distinct from Referential Integrity, which is the physical feature in SQL);
  • Relational Power (your navigation and query code will be dead simple, JOINs of distal tables will not require intermediate tables to be JOINed, etc); and
  • Relational Speed (fewest indices).

If interested, some of my recent Answers contain details and examples of those benefits.

BrianTA

Notation

  • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.

  • My IDEF1X Introduction is essential reading for those who are new to the Relational Model or Relational data modelling.

Content

  • Rather than the example data you have given at the top of your question (which is not really example data, but a simple definition of the Form that you would like the display of the data to take), examine the example data (which are, of course, Identifying Keys) that I have given (blue).

    • The numbers are redundant. if actually necessary, they can be computed on the fly
    • Do not use numbers as Keys, because, when the structure changes, you will have to renumber all your Keys. Totally unnecessary.
  • The Relational Model demands that the Keys are "made up from the data". The numbers are not.

  • The data hierarchy is both:

    • visual in the data model

    • and reflected in the Relational Keys.

  • Relational Keys

    • are composites, get used to it. SQL has been handling composite Keys since 1984.
    • define and implement the data hierarchy. No surprise, because the Relational Model is hierarchic.
    • Course, Lesson, Topic, Quiz are Key elements, short names, not necessarily codes (they would be codes in a large university), that are unique in the given context.
    • The Title would be the long name that shows up on screens for external users (prospects).

    • A Lesson does not exist independently, it exists only in the context of a Course. Therefore the Key for Lesson is ( Course, Lesson ). An ID does not uniquely identify a Lesson (or anything else, for that matter). You might have more than one Lesson across the Courses.

    • A Topic does not exist independently, it exists only in the context of a Course and a Lesson. Therefore the Key for Topic is ( Course, Lesson, Topic ). Etc.

    • A Quiz does not exist independently, it exists only in the context of a Course, a Lesson, and a Topic. Therefore the Key for Quiz is ( Course, Lesson, Topic, Quiz ). Etc.

  • Relational Integrity

    • Eg. A Quiz is constrained to a particular Topic, which is Identified by (Course, Lesson, Topic).

    • Whereas a Record ID based filing system (falsely marketed as "relational") cannot provide that, it can only constrain a Quiz to any Topic.

    • Of course, that applies to all the tables in the data hierarchy.

  • The relationship between Posts and the structure that is the answer to the question, is not at all clear, therefore it would be incorrect to model it. Nevertheless, it is important to you, so I have modelled it on a possibility basis, with reticence. If you define with more clarity, I will update the data model.

Query

Dead simple, compared to that which you have in the fiddle, or that which is demanded by a Record ID based filing system.

Get me all lessons for a course

SELECT  Course,
        Lesson,
        Title              -- Lesson Title
    FROM Lesson
    WHERE Course = @Course -- "Jazz"

Get me all quizzes for a course (which means all quizzes across all topics in the course)

SELECT  Course,
        Lesson,
        Topic,
        Quiz,
        Title              -- Quiz Title
    FROM Quiz
    WHERE Course = @Course -- "Jazz"

Get me all quizzes for a lesson (which means quizzes under topics for the lesson)

SELECT  Course,
        Lesson,
        Topic,
        Quiz,
        Title                 -- Quiz Title
    FROM Quiz
    WHERE Course = @Course    -- "Classical"
        AND Lesson = @Lesson  -- "Chords"

Let me demonstrate a bit of Relational Power, here the aspect is JOIN power, eliminating intermediate tables (which are forced in Record ID based filing systems).

Gimme all the Courses in which Licks are taught (meaning, in the Title of a Topic)

SELECT  DISTINCT Course.Title,
        Topic.Topic,
        Topic.Title
    FROM Topic
        JOIN Course ON Course.Course = Topic.Course
    WHERE Title LIKE "%Lick%"
  • If Course (short name) were enough to Identify the course, such that Course.Title were not necessary in the result set, the Course table can be eliminated from the query.

    SELECT  DISTINCT Course,
            Topic,
            Title
        FROM Topic
        WHERE Title LIKE "%Lick%"
    

Please feel free to ask questions, or to supply clarifications. If short, in the comments, otherwise update (edit) your question.


Comments

The thing that I am still not sure about is order of joins. With your courses that have some text in topic example, I would have thought that the JOIN is on topics. Maybe the order doesn't matter in case of INNER JOIN?

  1. Minor coding error, TopicCourse. Fixed.

  2. Order of JOINs

    • If you have a commercial SQL platform, the order of JOINs is irrelevant, because the Query Optimiser will choose a QueryPlan based on Statistics, not on whatever is in the code, ordered or not.
    • Therefore the JOIN order in the code should be whatever is logical (refer to my code examples in other recent Answers).

    • If, not, then sure, whatever helps the mickey mouse freeware perform better. In that case, the Join order should be from the smallest populated table to the largest.

  3. In a Relational database, almost all JOINs are INNER JOINs. OUTER JOINs are required only to capture optional rows (that may not exist) and to show that non-existence in the result set.

what do you think of taxonomies (like in WordPress) as a way of relating data? Just trying to figure out tradeoff between using what WordPress already has to offer and me doing pure custom approach.

  • Taxonomy in WP is pig poop. Same as the several methods that set up fragments of hierarchies, or build hierarchies in concrete, as given in the other link.

    • It is probably adequate, in that it provides the simple relation (not Relational) capabilities that most WP users would need. It is hopeless when you push that boundary. As you have found out.

    • You cannot do it in WP, full stop. Where you can do it, it is ordinary, not "custom".

  • Basically, the issue that is being raised, although not named, is System Architecture. For computer systems, the definition of Architecture is:

    • The appropriate composition [decomposition; segmentation; Normalisation] of code and data objects, and the deployment of said objects in appropriate [platform] locations.

    • Use the right tool for the job translates to: deploy the object in the right place.

  • The right place for all the data for the app is, in a single database. That would make life easy for all maintenance. You do not want bits here, and other bits elsewhere.

  • Remember, the database is a single recovery unit. Therefore, you need all rules; constraints; views; etc, and importantly all Transactions (the database API), in the database, and nowhere else.

  • Using a reliable SQL platform means security for the future. Once you have the database working correctly, you will never have to change it (expansion is a separate issue). It is isolated from the app, which is appropriate, and thus isolated from anything that changes in the app environment.

The corollary is:

  • you can mess with the WP taxonomy, which you know already is problematic; pushing its boundary, and therefore you know that if you ever have to tweak it, it will be problematic; and pushing its boundary.

  • And if WP ever change their taxonomy library to fix the problems, or to enhance the capability, you will have to change data objects; relationships; etc.

From where I sit, it is not a trade-off, it is an Architectural decision.


Upvotes: 2

Related Questions