Urvin
Urvin

Reputation: 41

PostgreSQL - paginate by rows' group

I have a table with cars description:

create table car
(
  id serial constraint car_pk primary key,
  vendor_name varchar not null,
  model_name varchar not null,
  body_type varchar not null,
  specifications_name varchar not null,
  price int4 not null
);

Filled by the next data:

    INSERT INTO car(vendor_name, model_name, body_type, specifications_name, price) VALUES
('Peugeot', '408', 'Sedan', 'Allure 115hp brown', 1144000),
('LADA', 'Vesta', 'Sedan', 'Luxe seawave', 635000),
('Ford', 'Focus', 'Hatchback', 'Sync gray', 1109000),
('Ford', 'Focus', 'Sedan', 'Sync white', 1250800),
('LADA', 'Vesta', 'Sedan', 'Сlassic green', 631800),
('Audi', 'A4', 'Wagon', 'yellow', 2900000),
('Ford', 'Focus', 'Hatchback', 'Special tangerine', 1126000),
('LADA', 'Granta', 'Sedan', 'Comfort gray', 520000),
('LADA', 'Vesta', 'Sedan', 'Сomfort blue', 631100),
('Ford', 'Focus', 'Sedan', 'Trend blue', 1235000),
('LADA', 'Vesta', 'Wagon', 'Comfort orange', 679000),
('Audi', 'A4', 'Sedan', 'yellow', 2000000),
('LADA', 'Granta', 'Sedan', 'Luxe Prestige green', 576000),
('Peugeot', '408', 'Sedan', 'Active red', 1177000),
('Audi', 'A4', 'Sedan', 'yellow', 2000000),
('Ford', 'Focus', 'Sedan', 'Special tangerine', 1203000),
('LADA', 'Granta', 'Sedan', 'Luxe gray', 531000),
('Peugeot', '408', 'Sedan', 'Allure 150hp white', 1122000),
('Audi', 'A4', 'Wagon', 'gray', 2900000),
('LADA', 'Vesta', 'Wagon', 'Luxe white', 680000),
('Ford', 'Focus', 'Sedan', 'Special orange', 1211000),
('Ford', 'Focus', 'Hatchback', 'Special orange', 1125000),
('LADA', 'Vesta', 'Wagon', 'Comfort plum', 630000),
('Peugeot', '408', 'Sedan', 'Allure 150hp purple', 1125000),
('Audi', 'A3', 'HatchBack', 'white', 2000000),
('Ford', 'Focus', 'Hatchback', 'Special lemon', 1088000),
('LADA', 'Vesta', 'Wagon', 'Luxe blue', 699000),
('Ford', 'Focus', 'Sedan', 'Trend green', 1230000),
('LADA', 'Vesta', 'Sedan', 'Luxe dark green', 634000),
('Ford', 'Focus', 'Sedan', 'Sync gray', 1260000),
('LADA', 'Granta', 'Wagon', 'Comfort magenta', 566000),
('LADA', 'Granta', 'Sedan', 'Comfort red', 520000),
('LADA', 'Vesta', 'Sedan', 'Сlassic brown', 631000),
('Ford', 'Focus', 'Sedan', 'Special lemon', 1201000),
('Ford', 'Focus', 'Hatchback', 'Trend blue', 1065000),
('LADA', 'Vesta', 'Wagon', 'Luxe red', 679000),
('LADA', 'Granta', 'Wagon', 'Standart white', 520000),
('Audi', 'A4', 'Wagon', 'black', 3000000),
('LADA', 'Vesta', 'Sedan', 'Сomfort impressive', 641000),
('Ford', 'Focus', 'Sedan', 'Sync black', 1250000),
('LADA', 'Granta', 'Sedan', 'Standart black', 438000),
('Audi', 'A3', 'HatchBack', 'yellow', 2000000),
('LADA', 'Granta', 'Wagon', 'Standart black', 465030),
('LADA', 'Vesta', 'Sedan', 'Сlassic white', 638005),
('LADA', 'Granta', 'Wagon', 'Standart blue', 485000),
('LADA', 'Granta', 'Wagon', 'Comfort asphalt', 566000),
('Audi', 'A4', 'Wagon', 'white', 2900000),
('Ford', 'Focus', 'Hatchback', 'Trend white', 1027000),
('LADA', 'Granta', 'Sedan', 'Standart blue', 438000),
('LADA', 'Granta', 'Wagon', 'Luxe purple', 662000),
('LADA', 'Vesta', 'Wagon', 'Comfort yellow', 679010),
('Ford', 'Focus', 'Sedan', 'Trend white', 1230000),
('Audi', 'A3', 'HatchBack', 'black', 2000000),
('LADA', 'Granta', 'Wagon', 'Comfort cyan', 566000),
('LADA', 'Granta', 'Wagon', 'Luxe brown', 662080),
('LADA', 'Granta', 'Wagon', 'Luxe like a boss', 662100),
('LADA', 'Vesta', 'Sedan', 'Сomfort navy', 631000),
('LADA', 'Vesta', 'Sedan', 'Luxe blue', 636000),
('Ford', 'Focus', 'Hatchback', 'Sync black', 1082000),
('Ford', 'Focus', 'Hatchback', 'Sync white', 1092000)
;

I sort cars in a way:

So, here's the query for it:

SELECT
  *,
  MIN(price) OVER win_vendor min_price_vendor,
  MIN(price) OVER win_model min_price_model,
  MIN(price) OVER win_body min_price_body
FROM
  car
WINDOW
  win_vendor AS (PARTITION BY vendor_name),
  win_model AS (PARTITION BY vendor_name, model_name),
  win_body AS (PARTITION BY vendor_name, model_name, body_type)
ORDER BY
  min_price_vendor,
  min_price_model,
  min_price_body,
  price,
  specifications_name

I want to ask you how to process the pagination. I need to paginate my sorted reults into pages, number of rows that difer from each to other, so I cannot use LIMIT/OFFSET functions; I need every page to begin (or end) on the edge of vendor-model-body block containig at least N rows.

I think, it's better to illustate, an example for N=10 rows: click for image.

With the data shown above I hawe pages of 15, 15,17,13 rows size.

I wish to have a page_number field to add a "WHERE page_number = K" into my app query to get the Kth page.

Please tell me how can I form the page number field for such conditions.

Thank you!

Upvotes: 3

Views: 361

Answers (2)

Urvin
Urvin

Reputation: 41

The main problem seems to me is to save the state of page iterators. A custom window function would be a best solution probably, but I can google no examples of writing it.

I found, that PostgreSql allows to save "static variables". We can use current_setting / set_config functions for that. Also set_config allows to save values just inly for active transaction, that's great enough.

So I wrote a function using these "static variables" that can be used with a sorted list having a string grouping key. In my case this key is vendor-model-body.

CREATE OR REPLACE FUNCTION grouped_pagination_page(current_key VARCHAR, per_page INT4) RETURNS INT4 AS $$
  DECLARE
    last_key VARCHAR;
    last_row_count INT4;
    last_page INT4;
  BEGIN
    SELECT COALESCE(current_setting('GPP.last_key', TRUE), '') INTO last_key;
    SELECT CAST(COALESCE(NULLIF(current_setting('GPP.last_row_count', TRUE),''),'0') AS INT) INTO last_row_count;
    SELECT CAST(COALESCE(NULLIF(current_setting('GPP.last_page', TRUE),''),'1') AS INT) INTO last_page;

    IF current_key <> last_key THEN
      PERFORM set_config('GPP.last_key', current_key, TRUE);
      IF last_row_count >= per_page THEN
        last_page = last_page + 1;
        last_row_count = 0;

        PERFORM set_config('GPP.last_page', last_page::VARCHAR, TRUE);
      END IF;
    END IF;

    last_row_count = last_row_count + 1;
    PERFORM set_config('GPP.last_row_count', last_row_count::VARCHAR, TRUE);

    RETURN last_page;
  END;
$$ LANGUAGE 'plpgsql';

So, here's my query with a page_number field with variable rows on page count:

SELECT *,
  MIN(price) OVER win_vendor min_price_vendor,
  MIN(price) OVER win_model min_price_model,
  MIN(price) OVER win_body min_price_body,
  grouped_pagination_page((vendor_name || model_name || body_type)::VARCHAR, 10) page_number
FROM
  car
WINDOW
  win_vendor AS (PARTITION BY vendor_name),
  win_model AS (PARTITION BY vendor_name, model_name),
  win_body AS (PARTITION BY vendor_name, model_name, body_type)
ORDER BY min_price_vendor,
  min_price_model,
  min_price_body,
  price,
  specifications_name

It returns as expecting 15,15,17,13 rows for a page;

It's not elegant solution but works.

Upvotes: 1

S-Man
S-Man

Reputation: 23736

I did something very similar already here: Paginate grouped query results with limit per page

As I stated there I don't find any solution for a single query. The problem is that your pages can produce a very dynamic row count. So every page content hardly depends on the pages before. So you cannot find a simple solution in one query which refers to its own result a few rows before.

So you will need a little function to create your result. I wrote a function which takes the parameters "min row count per page" and "expected page id" (I took the function from the SO question above as basis for this one - so the both results are quiet similar):

demo:db<>fiddle

CREATE OR REPLACE FUNCTION get_category_for_page(_min_rows int, _page_id int) RETURNS int[] AS $$
DECLARE
    _remainder int := _min_rows;
    _page_counter int := 1;
    _categories int[] = '{}';
    _temprow record;
BEGIN
    FOR _temprow IN

        SELECT                                                    -- 1 
            min_price_vendor,
            min_price_model,
            min_price_body, 
            COUNT(*)
        FROM (
            -- <your query>
        ) s
        GROUP BY
            min_price_vendor,
            min_price_model,
            min_price_body
        ORDER BY
            min_price_vendor,
            min_price_model,
            min_price_body

    LOOP
        IF (_page_counter = _page_id) THEN                        -- 2
            _categories := _categories || _temprow.min_price_body;
        END IF;

        IF (_remainder - _temprow.count < 0) THEN                 -- 3
            _page_counter := _page_counter + 1;
            _remainder := _max_rows;
        ELSE 
            _remainder := _remainder - _temprow.count;            -- 4
        END IF;

        IF (_page_counter > _page_id) THEN                        -- 5
            EXIT;
        END IF;

    END LOOP;

    RETURN _categories;
END;
$$ LANGUAGE plpgsql;

Explanation:

  1. This query calculates the amount of rows per category in your query. The result is going to be iterated within the LOOP:
  2. If the _page_counter equals the interesting _page_id the current category will be added to the output. This can happen several times.
  3. The _remainder stores the value of how many rows already fit into the current page. If the current category has more rows than the remainder allows a new page is generated (_page_counter increased) and the remainder will be reset.
  4. Otherwise the remainder will be decreased by the number of rows of the current category
  5. If the _page_counter is higher than the interesting _page_id no further calculations are needed

Now you can call the function this way:

SELECT get_category_for_page(10, 2);

So finally your query will look like:

SELECT 
    *
FROM -- <your query>
WHERE 
    min_price_body= ANY(get_category_for_page(10, 2)) 

Disclaimer

I believe there are certain special cases that should be tested (and in failing test the function has to be increased) but in general this idea should work.

Upvotes: 1

Related Questions