Reputation: 335
I have a table which describes agent who book tickets for different customers. following data describes one customer data.
From the above data what i am expecting is
The out put meaning is, i want to group the queues first he booked some tickets to singapore, then austin, again singapore and delhi
How we can achieve this in SQL please help me on this
if out put is like as follows is also helpful
Upvotes: 1
Views: 121
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT agent_id,
STRING_AGG(DISTINCT travel_destination) AS travel_destination
FROM `project.dataset.table`
GROUP BY agent_id
it will produce following output
Row agent_id travel_destination
1 A1001 Singapore,Austin,Delhi
Looks like expected output is Singapore,Austin,Singapore,Delhi
- below is yet another option for this
#standardSQL
CREATE TEMP FUNCTION DedupConsecutive(line STRING) RETURNS STRING LANGUAGE js AS """
return line.split(",").filter(function(value,index,arr){return value != arr[index+1];}).join(",");
""";
SELECT agent_id,
DedupConsecutive(STRING_AGG(travel_destination ORDER BY date_of_booking)) destinations
FROM `project.dataset.table`
GROUP BY agent_id
Same sentiment as by Gordon - I cannot think of a simpler solution.
:o)
Upvotes: 0
Reputation: 1269923
I would just use lag()
:
SELECT t.agent_id, t.travel_dest
FROM (SELECT t.*,
LAG(travel_dest) OVER (PARTITION BY agent_id ORDER BY bookdt) as prev_travel_dest
FROM t
) t
WHERE prev_travel_dest IS NULL OR prev_travel_dest <> travel_dest
ORDER BY agent_id, bookdt;
I cannot think of a simpler solution.
Upvotes: 0
Reputation: 222482
This is a gaps and island problem. To solve it, you need to generate groups of adjacent records. This is usually done by comparing row numbers accross two different partitions.
Consider:
select
agent_id,
travel_destination,
min(date_of_booking) first_date_of_booking,
max(date_of_booking) max_date_of_booking
from (
select
t.*,
row_number()
over(partition by agent_id order by date_of_booking) rn1,
row_number()
over(partition by agent_id, travel_destination order by date_of_booking) rn2
from mytable t
) t
group by
agent_id,
rn1 - rn2,
travel_destination
order by first_date_of_booking
Note that I added the beginning and ending date of each group to the answer, because I find that it makes the answer more meaningful.
Another remark: based on your sample data, it is unclear whether you want to put the customerid
in the group; I assumed not (if yes, you need to add that column to both partitions).
Given this (simplified) dataset:
agent_id | travel_destination | customer_id | date_of_booking :------- | :----------------- | :---------- | :-------------- A1001 | Singapore | C1001 | 2019-06-10 A1001 | Singapore | C1001 | 2019-06-11 A1001 | Austin | C1001 | 2019-06-12 A1001 | Singapore | C1001 | 2019-06-13 A1001 | Singapore | C1001 | 2019-06-14 A1001 | Dehli | C1001 | 2019-06-15
The query returns:
agent_id | travel_destination | first_date_of_booking | max_date_of_booking :------- | :----------------- | :-------------------- | :------------------ A1001 | Singapore | 2019-06-10 | 2019-06-11 A1001 | Austin | 2019-06-12 | 2019-06-12 A1001 | Singapore | 2019-06-13 | 2019-06-14 A1001 | Dehli | 2019-06-15 | 2019-06-15
To achieve the second output that you demonstrated, you can add another level of aggregation and use string_agg()
select
agent_id,
string_agg(travel_destination order by first_date_of_booking) travel_destination
from (
-- above query
) t
group by agent_id
Upvotes: 1
Reputation: 6741
Try this - at least if your database has a function like LISTAGG, as in Vertica ...
WITH
-- this is your input - next time put it in so it can be
-- copy-pasted and formatted to the below ....
input(agent_id,travel_dest,cust_id,bookdt) AS (
SELECT 'A1001','Singapore','C1001',DATE '2109-06-10'
UNION ALL SELECT 'A1001','Singapore','C1001',DATE '2019-06-11'
UNION ALL SELECT 'A1001','Austin' ,'C1001',DATE '2019-06-19'
UNION ALL SELECT 'A1001','Austin' ,'C1001',DATE '2019-06-19'
UNION ALL SELECT 'A1001','Austin' ,'C1001',DATE '2019-06-20'
UNION ALL SELECT 'A1001','Singapore','C1001',DATE '2019-07-30'
UNION ALL SELECT 'A1001','Singapore','C1001',DATE '2019-07-31'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-01'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-10'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-10'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-10'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-10'
UNION ALL SELECT 'A1001','Delhi' ,'C1001',DATE '2019-08-25'
)
-- real WITH clause starts here - substitute comma below with "WITH" ...
,
with_prev AS (
SELECT
agent_id
, travel_dest
, LAG(travel_dest,1,'') OVER (PARTITION BY agent_id ORDER BY bookdt) AS prev_dest
FROM input
)
,
de_duped AS (
SELECT
agent_id
, travel_dest
FROM with_prev
WHERE travel_dest <> prev_dest
)
SELECT
agent_id
, LISTAGG(travel_dest) AS travel_dest
FROM de_duped
GROUP BY 1
;
You get:
agent_id | travel_dest
----------+--------------------------------------------
A1001 | Singapore,Austin,Singapore,Delhi,Singapore
Upvotes: 0