Reputation: 101
What I need to select is total number of trips made by every 'id_customer' from table user and their id, dispatch_seconds, and distance for first order. id_customer, customer_id, and order_id are strings.
It should looks like this
+------+--------+------------+--------------------------+------------------+
| id | count | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 | 3 | 4r56 | 1 | 500 |
| 2et7 | 2 | dc1f | 5 | 100 |
+------+--------+------------+--------------------------+------------------+
Cheers!
Original post was edited as during discussion S-man helped me to find exact problem solution. Solution by S-man https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5
Upvotes: 1
Views: 6176
Reputation: 23756
SELECT
customer_id,
order_id,
order_timestamp,
dispatch_seconds,
distance
FROM (
SELECT
*,
count(*) over (partition by customer_id), -- A
first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
FROM orders
)s
WHERE order_id = first_value -- C
https://www.postgresql.org/docs/current/static/tutorial-window.html
A window function which gets the total record count per user
B window function which orders all records per user by timestamp and gives the first order_id
of the corresponding user. Using first_value
instead of min
has one benefit: Maybe it could be possible that your order IDs are not really increasing by timestamp (maybe two orders come in simultaneously or your order IDs are not sequential increasing but some sort of hash)
--> both are new columns
C now get all columns where the "first_value" (aka the first order_id
by timestamp) equals the order_id
of the current row. This gives all rows with the first order by user.
Result:
customer_id count order_id order_timestamp dispatch_seconds distance
----------- ----- -------- ------------------- ---------------- --------
1ar5 3 4r56 2018-08-16 17:24:00 1 500
2et7 2 dc1f 2018-08-15 01:24:00 5 100
Note that in these test data the order "dc1f" of user "2et7" has a smaller timestamp but comes later in the rows. It is not the first occurrence of the user in the table but nevertheless the one with the earliest order. This should demonstrate the case first_value
vs. min
as described above.
Upvotes: 2
Reputation: 1271003
You are on the right track. Just use conditional aggregation:
SELECT o.customer_id, COUNT(*)
MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
FROM orders o
) o
GROUP BY o.customer_id;
Your JOIN
is not necessary for this query.
Upvotes: 1
Reputation: 5403
I think there are many mistakes in your original query, your rank isn't partitioned, the order by clause seems incorrect, you filter out all but one "random" order, then apply the count, the list goes on.
Something like this seems closer to what you seem to want?
SELECT
customer_id,
order_count,
order_id
FROM (
SELECT
a.customer_id,
a.order_count,
a.order_id,
RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
FROM (
SELECT
customer_id,
order_id,
COUNT(*) AS order_count
FROM
orders
GROUP BY
customer_id,
order_id) a) b
WHERE
b.rank_id = 1;
Upvotes: 0
Reputation: 50173
You can use window function :
select distinct customer_id,
count(*) over (partition by customer_id) as no_of_order
min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;
Upvotes: 0