Reputation:
I have a database that tracks UK Horse races.
Race contains all the information for a particular race.
CREATE TABLE "race" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"date" TEXT NOT NULL,
"time" TEXT NOT NULL,
"name" TEXT NOT NULL,
"class" INTEGER NOT NULL,
"distance" INTEGER NOT NULL,
"extra" TEXT NOT NULL,
"going" TEXT NOT NULL,
"handicap" INTEGER NOT NULL,
"prize" REAL,
"purse" REAL,
"surface" TEXT NOT NULL,
"type" TEXT NOT NULL,
"course_id" INTEGER NOT NULL,
"betfair_path" TEXT NOT NULL UNIQUE,
"racingpost_id" INTEGER NOT NULL UNIQUE,
UNIQUE("betfair_path", "racingpost_id")
);
A race can have many entries.
CREATE TABLE "entry" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"weight" INTEGER,
"allowance" INTEGER,
"horse_id" INTEGER NOT NULL,
"jockey_id" INTEGER,
"trainer_id" INTEGER,
"race_id" INTEGER NOT NULL,
UNIQUE("race_id", "horse_id")
);
An entry can have 0 or 1 runner. This takes into account non-runners, horses entered for a race but who failed to start.
CREATE TABLE "runner" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"position" TEXT NOT NULL,
"beaten" INTEGER,
"isp" REAL NOT NULL,
"bsp" REAL,
"place" REAL,
"over_weight" INTEGER,
"entry_id" INTEGER NOT NULL UNIQUE
);
My question is
Is that actually the best way to store my Entry vs Runner data? Note: Entry data is always harvested in a single sweep, and runner (basically result) is found later.
Apologies if I am missing something obvious but I am now brain dead from coding this application.
Upvotes: 0
Views: 1521
Reputation: 33708
Relational database tag, and you want advice on your schema as per title. Even though the single question is answered, you may have more tomorrow.
I couldn't make any sense of your three flat files, so I drew them up into what they might look like in a ▶Relational database◀, where the information is organised and queries are easy. Going brain dead is not unusual when the information remains in its complex form.
If you have not seen the Relational Modelling Standard, you might need the IDEF1X Notation.
Note, OwnerId
, JockeyId
, and TrainerId
are all PersonIds
. No use manufacturing new ones when there is a perfectly good unique one already sitting there in the table. Just rename it to reflect its Role, and the PK of the table that it is in (the relevance of this will become clear when you code).
MultipleSELECTS
are nothing to be scared of, SQL is a cumbersome language but that is all we have. The problem is:
the complexity (necessary due to a bad model) of eachSELECT
and whether you learn and understand how to use subqueries or not.
Single level queries are obviously very limited, and will lead to procedural (row-by-row) processing instead of set-processing.
Single level queries result in huge result sets that then have to be beaten into submission using GROUP BY
, etc. Not good for performance, churning through all that unwanted data; better to get only the data you really want.
Now the queries.
When you are printing race forms, I think you will need the Position
scheduled and advertised for the RaceEntry
; it is not an element of a Runner
.
Now that we have gotten rid of those Ids
all over the place, which force all sorts of unnecessary joins, we can join directly to the parents concerned (less joins). Eg. for the Race Form, which is only concerned with RaceEntry
, for the Owner, you can join to directly to Person
using WHERE OwnerId = Person.PersonId
; no need to join HorseRegistered
or Owner
.
LEFT and RIGHT joins are OUTER joins, which means the rows on one side may be missing. That method has been answered, and you will get Nulls, which you have to process later (more code and cycles). I do not think that is what you want, if you are filling forms or a web page.
The concept here is to think is terms of Relational sets, not row-by-row processing. But you need a database for that. Now that we have a bit of Relational power in the beast, you can try this for the Race Result (not the Race Form), instead of procedural processing. These are Scalar Subqueries. For the passed Race
Identifiers (the outer query is only concerned with a Race
):
SELECT (SELECT ISNULL(Place, " ")
FROM Runner
WHERE RacecourseCode = RE.RacecourseCode
AND RaceDate = RE.RaceDate
AND RaceNo = RE.RaceNo
AND HorseId = RE.HorseId) AS Finish,
(SELECT ISNULL(Name, "SCRATCH")
FROM Runner R,
Horse H
WHERE R.RacecourseCode = RE.RacecourseCode
AND R.RaceDate = RE.RaceDate
AND R.RaceNo = RE.RaceNo
AND R.HorseId = RE.HorseId
AND H.HorseId = RE.HorseId) AS Horse,
-- Details,
(SELECT Name FROM Person WHERE PersonId = RE.TrainerId) AS Trainer,
(SELECT Name FROM Person WHERE PersonId = RE.JockeyId) AS Jockey,
ISP AS SP,
Weight AS Wt
FROM RaceEntry RE
WHERE RaceDate = @RaceDate
AND RacecourseCode = @RacecourseCode -- to print entire race form,
AND RaceNo = @RaceNo -- remove these 2 lines
ORDER BY Position
Upvotes: 1
Reputation: 16768
This matches entries and runners for a given race
SELECT E.*, R.*
FROM entry E LEFT JOIN runner R on R.entry_id = E.id
WHERE E.race_id = X
If the entry has no runner, then the R.* fields are all null. You can count such null fields to answer your first query (or perhaps more easily, subtract)
Upvotes: 0
Reputation: 759
Your schema looks reasonable. The key construct to use to address your SQL questions is LEFT JOIN
, for example:
SELECT COUNT(entry.id) entry_count, COUNT(runner.id) runner_count
FROM entry
LEFT JOIN runner ON runner.entry_id = entry.id
WHERE race_id = 1
From Wikipedia:
... a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate).
So in general for your schema, focus on the entry
table and LEFT JOIN
the runner
table as needed.
Upvotes: 3