Reputation: 101

Combining COUNT and RANK - PostgreSQL

What I need to select is total number of trips made by every 'id_customer' from table user and their id, dispatch_seconds, and distance for first order. id_customer, customer_id, and order_id are strings.

It should looks like this

+------+--------+------------+--------------------------+------------------+
|  id  | count  | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 |      3 | 4r56       |                        1 |              500 |
| 2et7 |      2 | dc1f       |                        5 |              100 |
+------+--------+------------+--------------------------+------------------+

Cheers!

Original post was edited as during discussion S-man helped me to find exact problem solution. Solution by S-man https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5

Upvotes: 1

Answers (4)

S-Man

Reputation: 23756

db<>fiddle

SELECT 
    customer_id,
    order_id,
    order_timestamp,
    dispatch_seconds,
    distance
FROM (
    SELECT 
        *, 
        count(*) over (partition by customer_id),    -- A
        first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
    FROM orders
)s

WHERE order_id = first_value -- C

https://www.postgresql.org/docs/current/static/tutorial-window.html

A window function which gets the total record count per user

B window function which orders all records per user by timestamp and gives the first order_id of the corresponding user. Using first_value instead of min has one benefit: Maybe it could be possible that your order IDs are not really increasing by timestamp (maybe two orders come in simultaneously or your order IDs are not sequential increasing but some sort of hash)

--> both are new columns

C now get all columns where the "first_value" (aka the first order_id by timestamp) equals the order_id of the current row. This gives all rows with the first order by user.

Result:

customer_id  count  order_id  order_timestamp      dispatch_seconds  distance  
-----------  -----  --------  -------------------  ----------------  --------  
1ar5         3      4r56      2018-08-16 17:24:00  1                 500       
2et7         2      dc1f      2018-08-15 01:24:00  5                 100

Note that in these test data the order "dc1f" of user "2et7" has a smaller timestamp but comes later in the rows. It is not the first occurrence of the user in the table but nevertheless the one with the earliest order. This should demonstrate the case first_value vs. min as described above.

Upvotes: 2

Gordon Linoff

Reputation: 1271003

You are on the right track. Just use conditional aggregation:

SELECT o.customer_id, COUNT(*)
       MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
      FROM orders o
     ) o
GROUP BY o.customer_id;

Your JOIN is not necessary for this query.

Upvotes: 1

Richard Hansell

Reputation: 5403

I think there are many mistakes in your original query, your rank isn't partitioned, the order by clause seems incorrect, you filter out all but one "random" order, then apply the count, the list goes on.

Something like this seems closer to what you seem to want?

SELECT
    customer_id,
    order_count,
    order_id
FROM (
    SELECT
        a.customer_id,
        a.order_count,
        a.order_id,
        RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
    FROM (
        SELECT
            customer_id,
            order_id,
            COUNT(*) AS order_count
        FROM 
            orders
        GROUP BY
            customer_id,
            order_id) a) b
WHERE 
    b.rank_id = 1;

Upvotes: 0

Yogesh Sharma

Reputation: 50173

You can use window function :

select distinct customer_id, 
       count(*) over (partition by customer_id) as no_of_order
       min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;

Upvotes: 0

Combining COUNT and RANK - PostgreSQL

Answers (4)

Related Questions