qquest
qquest

Reputation: 11

Fetching unexpected results from WHRE clause with bpchar and character varying fields

I am new to PostgreSQL.
In my Postgres 14.8 database I have a students table. When executing the query:

SELECT stud_id, batch_type, subject, jdate ,subj_id, sdtime, subj_hours
FROM studs 
ORDER BY jdate DESC, subject DESC, stud_id ASC
LIMIT 15 OFFSET 0

I get the following correct result as expected:

Stud_id bpchar Batch_type Character varying Subject Character varying Jdate date Subj_id integer Sdtime Timestamp without time zone Subj_hours integer
3591f1gff960h09l Residential Software Systems Development 7 null 35 2024-03-20 12:42:55.040189 10
983e292ce1xy69c0 Residential Software Systems Development 5 null 35 2024-03-20 12:42:55.040189 14
7k992ad576842507 Residential Software Systems Development 4 null 35 2024-03-20 12:42:55.040189 12
b9ahibji8jgaeth2 Residential Software Systems Development 3 null 35 2024-03-20 12:42:55.040189 10
5758uh95376eb0b4 Residential Software Systems Development 2 2024-03-20 35 2024-03-20 12:42:55.040189 12
6ch9b2f7894740er Residential Software Systems Development 2024-03-20 35 2024-03-20 12:42:55.040189 14
a877450586v3869h Residential Software Systems Development 2024-03-20 35 2024-03-20 12:42:55.040189 14
78635a8s1586d95c Residential Software Systems Development 2024-03-15 35 2024-03-20 12:42:55.040189 14
b58c7fbf57328468 Residential Software Systems Development 2024-03-15 35 2024-03-20 12:42:55.040189 16
acb8g94351e84226 Residential Software Systems Development 2024-02-25 35 2024-03-20 12:42:55.040189 15
fvhud7653fd85m1s Residential Software Systems Development 2024-02-20 35 2024-03-20 12:42:55.040189 16
b7hf962ffb6he96j Residential Design and Technology 2024-02-19 35 2024-03-20 12:42:55.040189 14
394ij84275c6e97d Residential Software Systems Development 2024-01-15 35 2024-03-20 12:42:55.040189 16

The stud_id is char and subject is character varying. Basically I wanted to list all the rows where stud_id is greater than '983e292ce1xy69c0' with subject 'Software Systems Development 5'.Since subject is ordered DESC, stud_id ASC, I added the following Where condition to the query.

But by adding WHERE condition in the query as below:

SELECT stud_id, batch_type, subject, jdate ,subj_id, sdtime, subj_hours form studs
WHERE subject < 'Software Systems Development 5'
AND stud_id > '983e292ce1xy69c0'
ORDER BY jdate DESC, subject DESC, stud_id ASC
LIMIT 15 OFFSET 0

I get erroneous result as follows:

Stud_id bpchar Batch_type Character varying Subject Character varying Jdate date Subj_id integer Sdtime Timestamp without time zone Subj_hours integer
b9ahibji8jgaeth2 Residential Software Systems Development 3 null 35 2024-03-20 12:42:55.040189 10
a877450586v3869h Residential Software Systems Development 2024-03-20 35 2024-03-20 12:42:55.040189 14
b58c7fbf57328468 Residential Software Systems Development 2024-03-15 35 2024-03-20 12:42:55.040189 16
acb8g94351e84226 Residential Software Systems Development 2024-02-25 35 2024-03-20 12:42:55.040189 15
fvhud7653fd85m1s Residential Software Systems Development 2024-02-20 35 2024-03-20 12:42:55.040189 16
b7hf962ffb6he96j Residential Design and Technology 2024-02-19 35 2024-03-20 12:42:55.040189 14

but I was expecting the following result:

Stud_id bpchar Batch_type Character varying Subject Character varying Jdate date Subj_id integer Sdtime Timestamp without time zone Subj_hours integer
7k992ad576842507 Residential Software Systems Development 4 null 35 2024-03-20 12:42:55.040189 12
b9ahibji8jgaeth2 Residential Software Systems Development 3 null 35 2024-03-20 12:42:55.040189 10
5758uh95376eb0b4 Residential Software Systems Development 2 2024-03-20 35 2024-03-20 12:42:55.040189 12
6ch9b2f7894740er Residential Software Systems Development 2024-03-20 35 2024-03-20 12:42:55.040189 14
a877450586v3869h Residential Software Systems Development 2024-03-20 35 2024-03-20 12:42:55.040189 14
78635a8s1586d95c Residential Software Systems Development 2024-03-15 35 2024-03-20 12:42:55.040189 14
b58c7fbf57328468 Residential Software Systems Development 2024-03-15 35 2024-03-20 12:42:55.040189 16
acb8g94351e84226 Residential Software Systems Development 2024-02-25 35 2024-03-20 12:42:55.040189 15
fvhud7653fd85m1s Residential Software Systems Development 2024-02-20 35 2024-03-20 12:42:55.040189 16
b7hf962ffb6he96j Residential Design and Technology 2024-02-19 35 2024-03-20 12:42:55.040189 14
394ij84275c6e97d Residential Software Systems Development 2024-01-15 35 2024-03-20 12:42:55.040189 16

Some rows are missed in the result. Can someone help me to understand the reason and how to fix it?

Upvotes: 1

Views: 99

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658472

Keyset pagination with ROW value comparison

It seems you want rows starting from the 4th row in the first result. A typical use case for "keyset pagination". See:

However, your sort order is dictated by:

ORDER BY join_date DESC, subject DESC, stud_id ASC

Two issues prevent a simple solution with ROW value comparison:

  1. jdate has null values.
  2. A mix of opposing sort orders (ASC and DESC). See:

Without these issues, this would work:

SELECT ...
FROM   studs
WHERE  (jdate, subject, stud_id)
     < ('2024-03-20', 'Software Systems Development 5', '983e292ce1xy69c0')
ORDER  BY jdate DESC, subject DESC, stud_id DESC  -- adapted !!
LIMIT  15;

Very fast with a unique (!) index on studs(jdate, subject, stud_id).

While there are null values:

Are you aware that unique indices with null values are possible since Postgres 15?

Are you aware of the NULLS LAST clause?

But ROW value comparison still fails with any null values.

Solution for your quandary

Stuck with your unfortunate setup, only (comparatively) slow and tedious workarounds remain. For your original sort order:

ORDER  BY jdate DESC, subject DESC, stud_id ASC

Fork cases

To make the best of it, fork 2 cases based on filter values:

  1. Filter value for $jdate is NULL
WHERE subject < $subject
OR    subject = $subject AND stud_id > $stud_id
  1. Filter value for $jdate is NOT NULL

Use partial ROW value comparison for the two leading columns:

WHERE (jdate, subject) < ($jdate, $subject)
OR    (jdate, subject) = ($jdate, $subject) AND stud_id > $stud_id

Support that with an index on studs(jdate, subject). And, possibly, another on studs(subject). If there are not too many dupes it will still be fast. See:

Combined solution (not recommended)

You can combine both into a single query, but that will be slower for all.

WHERE CASE WHEN $jdate IS NULL
           THEN         subject  <          $subject
             OR         subject  =          $subject  AND stud_id > $stud_id
           ELSE (jdate, subject) < ($jdate, $subject)
             OR (jdate, subject) = ($jdate, $subject) AND stud_id > $stud_id
      END

I formatted to (hopefully) make the case clear.
We don't require additional parentheses because operator precedence works in out favor. (AND binds before OR)

My advice

Fix null values and set the column jdate to NOT NULL.

And sort by stud_id DESC.
Or find some way to create a multicolumn expression index with an inverted stud_id value. (Should be possible if the value is of numeric nature.) And use a matching query. Find an example here:

Upvotes: 0

Related Questions