user137621
user137621

Reputation:

Tips on Database schema

I have a database that tracks UK Horse races.

Race contains all the information for a particular race.

CREATE TABLE "race" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT,
  "date" TEXT NOT NULL,
  "time" TEXT NOT NULL,
  "name" TEXT NOT NULL,
  "class" INTEGER NOT NULL,
  "distance" INTEGER NOT NULL,
  "extra" TEXT NOT NULL,
  "going" TEXT NOT NULL,
  "handicap" INTEGER NOT NULL,
  "prize" REAL,
  "purse" REAL,
  "surface" TEXT NOT NULL,
  "type" TEXT NOT NULL,
  "course_id" INTEGER NOT NULL,
  "betfair_path" TEXT NOT NULL UNIQUE,
  "racingpost_id" INTEGER NOT NULL UNIQUE,
  UNIQUE("betfair_path", "racingpost_id")
);

A race can have many entries.

CREATE TABLE "entry" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT,
  "weight" INTEGER,
  "allowance" INTEGER,
  "horse_id" INTEGER NOT NULL,
  "jockey_id" INTEGER,
  "trainer_id" INTEGER,
  "race_id" INTEGER NOT NULL,
  UNIQUE("race_id", "horse_id")
);

An entry can have 0 or 1 runner. This takes into account non-runners, horses entered for a race but who failed to start.

CREATE TABLE "runner" (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "position" TEXT NOT NULL,
    "beaten" INTEGER,
    "isp" REAL NOT NULL,
    "bsp" REAL,
    "place" REAL,
  "over_weight" INTEGER,
    "entry_id" INTEGER NOT NULL UNIQUE
);

My question is

Is that actually the best way to store my Entry vs Runner data? Note: Entry data is always harvested in a single sweep, and runner (basically result) is found later.

Apologies if I am missing something obvious but I am now brain dead from coding this application.

Upvotes: 0

Views: 1521

Answers (3)

PerformanceDBA
PerformanceDBA

Reputation: 33708

Relational database tag, and you want advice on your schema as per title. Even though the single question is answered, you may have more tomorrow.

I couldn't make any sense of your three flat files, so I drew them up into what they might look like in a ▶Relational database◀, where the information is organised and queries are easy. Going brain dead is not unusual when the information remains in its complex form.

If you have not seen the Relational Modelling Standard, you might need the IDEF1X Notation.

Note, OwnerId, JockeyId, and TrainerId are all PersonIds. No use manufacturing new ones when there is a perfectly good unique one already sitting there in the table. Just rename it to reflect its Role, and the PK of the table that it is in (the relevance of this will become clear when you code).

MultipleSELECTSare nothing to be scared of, SQL is a cumbersome language but that is all we have. The problem is:

  • the complexity (necessary due to a bad model) of eachSELECT

  • and whether you learn and understand how to use subqueries or not.

    • Single level queries are obviously very limited, and will lead to procedural (row-by-row) processing instead of set-processing.

    • Single level queries result in huge result sets that then have to be beaten into submission using GROUP BY, etc. Not good for performance, churning through all that unwanted data; better to get only the data you really want.

Now the queries.

  1. When you are printing race forms, I think you will need the Position scheduled and advertised for the RaceEntry; it is not an element of a Runner.

  2. Now that we have gotten rid of those Ids all over the place, which force all sorts of unnecessary joins, we can join directly to the parents concerned (less joins). Eg. for the Race Form, which is only concerned with RaceEntry, for the Owner, you can join to directly to Person using WHERE OwnerId = Person.PersonId; no need to join HorseRegistered or Owner.

  3. LEFT and RIGHT joins are OUTER joins, which means the rows on one side may be missing. That method has been answered, and you will get Nulls, which you have to process later (more code and cycles). I do not think that is what you want, if you are filling forms or a web page.

  4. The concept here is to think is terms of Relational sets, not row-by-row processing. But you need a database for that. Now that we have a bit of Relational power in the beast, you can try this for the Race Result (not the Race Form), instead of procedural processing. These are Scalar Subqueries. For the passed Race Identifiers (the outer query is only concerned with a Race):

        SELECT  (SELECT ISNULL(Place, " ")
                FROM  Runner 
                WHERE RacecourseCode = RE.RacecourseCode
                AND   RaceDate       = RE.RaceDate
                AND   RaceNo         = RE.RaceNo
                AND   HorseId        = RE.HorseId) AS Finish,
            (SELECT ISNULL(Name, "SCRATCH")
                FROM  Runner R,
                      Horse  H
                WHERE R.RacecourseCode = RE.RacecourseCode
                AND   R.RaceDate       = RE.RaceDate
                AND   R.RaceNo         = RE.RaceNo
                AND   R.HorseId        = RE.HorseId
                AND   H.HorseId        = RE.HorseId) AS Horse,
            -- Details,
            (SELECT Name FROM Person WHERE PersonId = RE.TrainerId) AS Trainer,
            (SELECT Name FROM Person WHERE PersonId = RE.JockeyId) AS Jockey,
            ISP AS SP,
            Weight AS Wt
        FROM  RaceEntry RE
        WHERE RaceDate       = @RaceDate
        AND   RacecourseCode = @RacecourseCode  -- to print entire race form, 
        AND   RaceNo         = @RaceNo          -- remove these 2 lines
        ORDER BY Position

Upvotes: 1

jon_darkstar
jon_darkstar

Reputation: 16768

This matches entries and runners for a given race

SELECT E.*, R.*
FROM entry E LEFT JOIN runner R on R.entry_id = E.id
WHERE E.race_id = X

If the entry has no runner, then the R.* fields are all null. You can count such null fields to answer your first query (or perhaps more easily, subtract)

Upvotes: 0

Cameron Jordan
Cameron Jordan

Reputation: 759

Your schema looks reasonable. The key construct to use to address your SQL questions is LEFT JOIN, for example:

SELECT COUNT(entry.id) entry_count, COUNT(runner.id) runner_count 
FROM entry
LEFT JOIN runner ON runner.entry_id = entry.id
WHERE race_id = 1

From Wikipedia:

... a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate).

So in general for your schema, focus on the entry table and LEFT JOIN the runner table as needed.

Upvotes: 3

Related Questions