user1487861
user1487861

Reputation: 472

Big difference in time to delete from postgres temp tables

I had a query that was taking a long time to run and so I re-wrote it and now it takes almost no time at all to run - but I don't understand why.

I can understand a small difference, but can someone help explain the massive difference in time taken to run these two (seeming very similar) statements?

First:

DELETE FROM     t_old where company_id not in (select company_id from t_prop);

Second:

DELETE FROM     t_old a
using           t_prop b 
where           a.company_id=b.company_id
and             b.company_id is null;

Execution plan from first:

'[
  {
    "Plan": {
  "Startup Cost": 0,
  "Plans": [
    {
      "Filter": "(NOT (SubPlan 1))",
      "Startup Cost": 0,
      "Plans": [
        {
          "Startup Cost": 0,
          "Plans": [
            {
              "Startup Cost": 0,
              "Node Type": "Seq Scan",
              "Plan Rows": 158704,
              "Relation Name": "t_prop",
              "Alias": "t_prop",
              "Parallel Aware": false,
              "Parent Relationship": "Outer",
              "Plan Width": 4,
              "Total Cost": 2598.04
            }
          ],
          "Node Type": "Materialize",
          "Plan Rows": 158704,
          "Parallel Aware": false,
          "Parent Relationship": "SubPlan",
          "Plan Width": 4,
          "Subplan Name": "SubPlan 1",
          "Total Cost": 4011.56
        }
      ],
      "Node Type": "Seq Scan",
      "Plan Rows": 21760,
      "Relation Name": "t_old",
      "Alias": "t_old",
      "Parallel Aware": false,
      "Parent Relationship": "Member",
      "Plan Width": 6,
      "Total Cost": 95923746.03
    }
  ],
  "Node Type": "ModifyTable",
  "Plan Rows": 21760,
  "Relation Name": "t_old",
  "Alias": "t_old",
  "Parallel Aware": false,
  "Operation": "Delete",
  "Plan Width": 6,
  "Total Cost": 95923746.03
}

} ]'

Execution plan from second

'[
  {
    "Plan": {
      "Startup Cost": 0.71,
      "Plans": [
        {
          "Startup Cost": 0.71,
          "Plans": [
            {
              "Startup Cost": 0.42,
              "Scan Direction": "Forward",
              "Plan Width": 10,
              "Node Type": "Index Scan",

          "Index Cond": "(company_id IS NULL)",
          "Plan Rows": 1,
          "Relation Name": "t_prop",
          "Alias": "b",
          "Parallel Aware": false,
          "Parent Relationship": "Outer",
          "Total Cost": 8.44,
          "Index Name": "t_prop_idx2"
        },
        {
          "Startup Cost": 0.29,
          "Scan Direction": "Forward",
          "Plan Width": 10,
          "Node Type": "Index Scan",
          "Index Cond": "(company_id = b.company_id)",
          "Plan Rows": 5,
          "Relation Name": "t_old",
          "Alias": "a",
          "Parallel Aware": false,
          "Parent Relationship": "Inner",
          "Total Cost": 8.38,
          "Index Name": "t_old_idx"
        }
      ],
      "Node Type": "Nested Loop",
      "Plan Rows": 5,
      "Join Type": "Inner",
      "Parallel Aware": false,
      "Parent Relationship": "Member",
      "Plan Width": 12,
      "Total Cost": 16.86
    }
  ],
  "Node Type": "ModifyTable",
  "Plan Rows": 5,
  "Relation Name": "t_old",
  "Alias": "a",
  "Parallel Aware": false,
  "Operation": "Delete",
  "Plan Width": 12,
  "Total Cost": 16.86
}

} ]'

Upvotes: 2

Views: 102

Answers (1)

Łukasz Kamiński
Łukasz Kamiński

Reputation: 5940

Your second query will delete nothing, that is why it is so much faster.

EDIT: I guess I should explain why it will delete nothing. So...

What you wanted to do is actually this:

DELETE FROM     t_old a
using           t_old a2
LEFT JOIN       t_prop b ON b.company_id = a2.company_id
where           a.company_id=a2.company_id
and             b.company_id is null;

It could be faster, slower or equal in speed to your first query, but it will do the same thing.

Your second query however will only delete rows from t_old if there are also rows in t_prop with matching company_id, because you are making INNER JOIN in there. But there is also additional condition b.company_id is null with will limit additionally rows in t_prop to only those that have this column equal to NULL, but = operator does not work for NULL values and will never evaluate to true, so your first condition will always fail if you were to satisfy your second one. Considering there is AND between them, both must be satisfied and this is impossible.

What could work and would delete rows in t_old WHERE company_id IS NULL that are also in t_prop with same condition satisfied is that:

DELETE FROM     t_old a
using           t_prop b 
where           a.company_id IS NOT DISTINCT FROM b.company_id
and             b.company_id is null;

But it still wouldn't do what first query does.

Upvotes: 1

Related Questions