ere
ere

Reputation: 1779

Postgres distinct union only for specific columns

I have two sets of data, one of which is dynamically generated.

If I leave off the column state it works perfectly as that column doesn't really exist, my question is how can I ignore a column for the UNION so that it combines the two datasets (as it is it's the same as UNION ALL). eg I prefer the first table and want any rows from the second dataset ignored if they exist in the first one.

SELECT event_id, start_at, state
FROM event_logs
WHERE start_at BETWEEN current_date AND current_date + interval '3 weeks'
UNION
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int - 1) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at,
'draft' AS state
FROM events

Update, also tried:

WITH future_logs AS (
 SELECT id event_id,
 GENERATE_SERIES(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int -  1) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at,
 'draft' AS state
 FROM events)

SELECT future_logs.event_id, future_logs.start_at, future_logs.state
FROM future_logs
LEFT JOIN event_logs ON future_logs.event_id = event_logs.event_id AND future_logs.start_at = event_logs.start_at
WHERE event_logs.start_at BETWEEN current_date AND current_date + interval '3 weeks'

But got too few results 77 vs ~1000 expected.

Upvotes: 7

Views: 14615

Answers (3)

kotresh kumar
kotresh kumar

Reputation: 41

select DISTINCT ON (date_day) date_day, state from(
SELECT day::date as date_day, null as state
FROM generate_series(now()- interval '2 week'
, now()
, interval '1 day') day
UNION ALL
select distinct
  date_trunc('day',e.updated_at) as date_day,
  max(des.state) over (partition by date_trunc('day',des.updated_at)) as state
from device_event as des where e.id=49 and e.updated_at >= now() - interval '2 week'
) dba order by 1

Upvotes: 2

JosMac
JosMac

Reputation: 2302

I would add one other column taborder into your UNION query to ensure simple ordering of the rows and use window function row_number() over(...) in following way:

SELECT
  event_id,
  start_at,
  state
FROM (
  SELECT
    event_id,
    start_at,
    state, 
    row_number(*) OVER (PARTITION BY event_id, start_at ORDER BY taborder) AS rownum 
  FROM (
    SELECT
      event_id,
      start_at,
      state,
      1 AS taborder 
    FROM original_table
    
    UNION
    
    SELECT
      event_id,
      start_at,
      state,
      2 AS taborder 
    FROM draft_table
  ) src0
) src1 
WHERE rownum = 1
ORDER BY 1, 2, 3

Upvotes: 0

wildplasser
wildplasser

Reputation: 44230

Just add NOT EXISTS() to the second leg, and you can use UNION ALL to avoid sort/merging.


SELECT event_id, start_at, state
FROM event_logs
WHERE start_at BETWEEN current_date AND current_date + interval '3 weeks'

UNION ALL

SELECT id AS event_id
        , generate_series(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int - 1) + start_at::time
                , current_date + interval '3 weeks'
                , '1 week'::INTERVAL) AS start_at
        , 'draft' AS state
FROM events ev
WHERE NOT EXISTS ( SELECT*
        FROM event_logs nx
        WHERE nx.event_id =ev.id
        AND nx.start_at BETWEEN current_date AND current_date + interval '3 weeks'      )

        ;

Upvotes: 4

Related Questions