Reputation: 11
I am new to PostgreSQL.
In my Postgres 14.8 database I have a students table. When executing the query:
SELECT stud_id, batch_type, subject, jdate ,subj_id, sdtime, subj_hours
FROM studs
ORDER BY jdate DESC, subject DESC, stud_id ASC
LIMIT 15 OFFSET 0
I get the following correct result as expected:
Stud_id bpchar | Batch_type Character varying | Subject Character varying | Jdate date | Subj_id integer | Sdtime Timestamp without time zone | Subj_hours integer |
---|---|---|---|---|---|---|
3591f1gff960h09l | Residential | Software Systems Development 7 | null | 35 | 2024-03-20 12:42:55.040189 | 10 |
983e292ce1xy69c0 | Residential | Software Systems Development 5 | null | 35 | 2024-03-20 12:42:55.040189 | 14 |
7k992ad576842507 | Residential | Software Systems Development 4 | null | 35 | 2024-03-20 12:42:55.040189 | 12 |
b9ahibji8jgaeth2 | Residential | Software Systems Development 3 | null | 35 | 2024-03-20 12:42:55.040189 | 10 |
5758uh95376eb0b4 | Residential | Software Systems Development 2 | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 12 |
6ch9b2f7894740er | Residential | Software Systems Development | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 14 |
a877450586v3869h | Residential | Software Systems Development | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 14 |
78635a8s1586d95c | Residential | Software Systems Development | 2024-03-15 | 35 | 2024-03-20 12:42:55.040189 | 14 |
b58c7fbf57328468 | Residential | Software Systems Development | 2024-03-15 | 35 | 2024-03-20 12:42:55.040189 | 16 |
acb8g94351e84226 | Residential | Software Systems Development | 2024-02-25 | 35 | 2024-03-20 12:42:55.040189 | 15 |
fvhud7653fd85m1s | Residential | Software Systems Development | 2024-02-20 | 35 | 2024-03-20 12:42:55.040189 | 16 |
b7hf962ffb6he96j | Residential | Design and Technology | 2024-02-19 | 35 | 2024-03-20 12:42:55.040189 | 14 |
394ij84275c6e97d | Residential | Software Systems Development | 2024-01-15 | 35 | 2024-03-20 12:42:55.040189 | 16 |
The stud_id is char and subject is character varying. Basically I wanted to list all the rows where stud_id is greater than '983e292ce1xy69c0' with subject 'Software Systems Development 5'.Since subject is ordered DESC, stud_id ASC, I added the following Where condition to the query.
But by adding WHERE
condition in the query as below:
SELECT stud_id, batch_type, subject, jdate ,subj_id, sdtime, subj_hours form studs
WHERE subject < 'Software Systems Development 5'
AND stud_id > '983e292ce1xy69c0'
ORDER BY jdate DESC, subject DESC, stud_id ASC
LIMIT 15 OFFSET 0
I get erroneous result as follows:
Stud_id bpchar | Batch_type Character varying | Subject Character varying | Jdate date | Subj_id integer | Sdtime Timestamp without time zone | Subj_hours integer |
---|---|---|---|---|---|---|
b9ahibji8jgaeth2 | Residential | Software Systems Development 3 | null | 35 | 2024-03-20 12:42:55.040189 | 10 |
a877450586v3869h | Residential | Software Systems Development | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 14 |
b58c7fbf57328468 | Residential | Software Systems Development | 2024-03-15 | 35 | 2024-03-20 12:42:55.040189 | 16 |
acb8g94351e84226 | Residential | Software Systems Development | 2024-02-25 | 35 | 2024-03-20 12:42:55.040189 | 15 |
fvhud7653fd85m1s | Residential | Software Systems Development | 2024-02-20 | 35 | 2024-03-20 12:42:55.040189 | 16 |
b7hf962ffb6he96j | Residential | Design and Technology | 2024-02-19 | 35 | 2024-03-20 12:42:55.040189 | 14 |
but I was expecting the following result:
Stud_id bpchar | Batch_type Character varying | Subject Character varying | Jdate date | Subj_id integer | Sdtime Timestamp without time zone | Subj_hours integer |
---|---|---|---|---|---|---|
7k992ad576842507 | Residential | Software Systems Development 4 | null | 35 | 2024-03-20 12:42:55.040189 | 12 |
b9ahibji8jgaeth2 | Residential | Software Systems Development 3 | null | 35 | 2024-03-20 12:42:55.040189 | 10 |
5758uh95376eb0b4 | Residential | Software Systems Development 2 | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 12 |
6ch9b2f7894740er | Residential | Software Systems Development | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 14 |
a877450586v3869h | Residential | Software Systems Development | 2024-03-20 | 35 | 2024-03-20 12:42:55.040189 | 14 |
78635a8s1586d95c | Residential | Software Systems Development | 2024-03-15 | 35 | 2024-03-20 12:42:55.040189 | 14 |
b58c7fbf57328468 | Residential | Software Systems Development | 2024-03-15 | 35 | 2024-03-20 12:42:55.040189 | 16 |
acb8g94351e84226 | Residential | Software Systems Development | 2024-02-25 | 35 | 2024-03-20 12:42:55.040189 | 15 |
fvhud7653fd85m1s | Residential | Software Systems Development | 2024-02-20 | 35 | 2024-03-20 12:42:55.040189 | 16 |
b7hf962ffb6he96j | Residential | Design and Technology | 2024-02-19 | 35 | 2024-03-20 12:42:55.040189 | 14 |
394ij84275c6e97d | Residential | Software Systems Development | 2024-01-15 | 35 | 2024-03-20 12:42:55.040189 | 16 |
Some rows are missed in the result. Can someone help me to understand the reason and how to fix it?
Upvotes: 1
Views: 99
Reputation: 658472
It seems you want rows starting from the 4th row in the first result. A typical use case for "keyset pagination". See:
However, your sort order is dictated by:
ORDER BY join_date DESC, subject DESC, stud_id ASC
Two issues prevent a simple solution with ROW value comparison:
jdate
has null values.ASC
and DESC
). See:
Without these issues, this would work:
SELECT ...
FROM studs
WHERE (jdate, subject, stud_id)
< ('2024-03-20', 'Software Systems Development 5', '983e292ce1xy69c0')
ORDER BY jdate DESC, subject DESC, stud_id DESC -- adapted !!
LIMIT 15;
Very fast with a unique (!) index on studs(jdate, subject, stud_id)
.
While there are null values:
Are you aware that unique indices with null values are possible since Postgres 15?
Are you aware of the NULLS LAST
clause?
But ROW value comparison still fails with any null values.
Stuck with your unfortunate setup, only (comparatively) slow and tedious workarounds remain. For your original sort order:
ORDER BY jdate DESC, subject DESC, stud_id ASC
To make the best of it, fork 2 cases based on filter values:
$jdate
is NULL
WHERE subject < $subject
OR subject = $subject AND stud_id > $stud_id
$jdate
is NOT NULL
Use partial ROW value comparison for the two leading columns:
WHERE (jdate, subject) < ($jdate, $subject)
OR (jdate, subject) = ($jdate, $subject) AND stud_id > $stud_id
Support that with an index on studs(jdate, subject)
. And, possibly, another on studs(subject)
. If there are not too many dupes it will still be fast. See:
You can combine both into a single query, but that will be slower for all.
WHERE CASE WHEN $jdate IS NULL
THEN subject < $subject
OR subject = $subject AND stud_id > $stud_id
ELSE (jdate, subject) < ($jdate, $subject)
OR (jdate, subject) = ($jdate, $subject) AND stud_id > $stud_id
END
I formatted to (hopefully) make the case clear.
We don't require additional parentheses because operator precedence works in out favor. (AND
binds before OR
)
Fix null values and set the column jdate
to NOT NULL
.
And sort by stud_id DESC
.
Or find some way to create a multicolumn expression index with an inverted stud_id
value. (Should be possible if the value is of numeric nature.) And use a matching query. Find an example here:
Upvotes: 0