Reputation: 7921
Can someone please point out what I'm doing wrong here?
I've got a table of timestamped data:
CREATE TABLE data_record (
id UUID,
t TIMESTAMP,
d INTEGER,
PRIMARY KEY(id, t)
);
I've got some data from some projects in that table, and now I'm trying to create a batch process to maintain one project where the data value (d
) is the sum of the other projects. To start, I'm trying to insert rows for the summary-project for all timestamps where the other projects have rows:
WITH source_ids AS (
SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
INSERT INTO data_record (id, t) (
SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, d1.t
FROM data_record d1
LEFT JOIN data_record d2
ON d1.t = d2.t
AND d1.id IN (SELECT * FROM source_ids)
AND d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid
WHERE d2.t IS NULL;
As far as I can see, that should create a row with id ab3b516e-acd7-11e7-a0db-f23c91e2b423
for every timestamp in a row with either of the IDs in source_ids
. But after executing that query, I then do this:
WITH source_ids AS (
SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
SELECT COUNT(d1.t)
FROM data_record d1
LEFT JOIN data_record d2
ON d1.t = d2.t
AND d1.id IN (SELECT * FROM source_ids)
AND d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'
WHERE d2.t IS NULL;
The INSERT
query affected 28,237 rows; the SELECT query returns 55561 where I think it should return zero.
I guess there's something about the AND d1.id IN (SELECT * FROM source_ids)
that's not working as it looks like it should, but what?
Upvotes: 0
Views: 57
Reputation: 1269463
When using LEFT JOIN
, filters on the first table should be in the WHERE
clause. Filters on the second in the ON
clause. Usually, the mistake is made on the filters on the second table. Yours is on filters on the first table.
So:
WITH source_ids AS (
SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
INSERT INTO data_record (id, t) (
SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, d1.t
FROM data_record d1 LEFT JOIN data_record
d2
ON d1.t = d2.t AND
d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid
WHERE d2.t IS NULL AND d1.id IN (SELECT * FROM source_ids);
That still seems over-complicated to me. I might suggest:
WITH . . .
SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, dr.t
FROM (SELECT dr.*,
COUNT(*) FILTER (WHERE dr.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid) OVER (PARTITION BY t) as cnt
FROM data_record dr
) dr
WHERE cnt = 0;
Depending on how the data and indexes are set up, your original version might be faster.
Upvotes: 2