Tom
Tom

Reputation: 7921

SQL Left Join Behaving Weirdly

Can someone please point out what I'm doing wrong here?

I've got a table of timestamped data:

CREATE TABLE data_record (
    id UUID,
    t  TIMESTAMP,
    d  INTEGER,
    PRIMARY KEY(id, t)
);

I've got some data from some projects in that table, and now I'm trying to create a batch process to maintain one project where the data value (d) is the sum of the other projects. To start, I'm trying to insert rows for the summary-project for all timestamps where the other projects have rows:

WITH source_ids AS (
    SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
INSERT INTO data_record (id, t) (
     SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, d1.t
       FROM data_record d1
  LEFT JOIN data_record d2
         ON d1.t = d2.t
        AND d1.id IN (SELECT * FROM source_ids)
        AND d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid
      WHERE d2.t IS NULL;

As far as I can see, that should create a row with id ab3b516e-acd7-11e7-a0db-f23c91e2b423 for every timestamp in a row with either of the IDs in source_ids. But after executing that query, I then do this:

WITH source_ids AS (
    SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
   SELECT COUNT(d1.t)
     FROM data_record d1
LEFT JOIN data_record d2
       ON d1.t = d2.t
      AND d1.id IN (SELECT * FROM source_ids)
      AND d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'
    WHERE d2.t IS NULL;

The INSERT query affected 28,237 rows; the SELECT query returns 55561 where I think it should return zero.

I guess there's something about the AND d1.id IN (SELECT * FROM source_ids) that's not working as it looks like it should, but what?

Upvotes: 0

Views: 57

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

When using LEFT JOIN, filters on the first table should be in the WHERE clause. Filters on the second in the ON clause. Usually, the mistake is made on the filters on the second table. Yours is on filters on the first table.

So:

WITH source_ids AS (
    SELECT UNNEST(ARRAY['1e77b896-9e1b-11e7-a0db-f23c91e2b423'::uuid, '7dd37dd0-9e1a-11e7-a0db-f23c91e2b423'::uuid])
)
INSERT INTO data_record (id, t) (
     SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, d1.t
     FROM data_record d1 LEFT JOIN data_record
          d2
          ON d1.t = d2.t AND
             d2.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid
      WHERE d2.t IS NULL AND d1.id IN (SELECT * FROM source_ids);

That still seems over-complicated to me. I might suggest:

WITH . . . 
SELECT DISTINCT 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid, dr.t
FROM (SELECT dr.*,
             COUNT(*) FILTER (WHERE dr.id = 'ab3b516e-acd7-11e7-a0db-f23c91e2b423'::uuid) OVER (PARTITION BY t) as cnt
      FROM data_record dr
     ) dr
WHERE cnt = 0;

Depending on how the data and indexes are set up, your original version might be faster.

Upvotes: 2

Related Questions