James Risner
James Risner

Reputation: 6074

Slow performance on MariaDB 10.7.6 using a query with many subselects

Running on a M1 Max macOS 12.6 with MariaDB 10.7 installed via homebrew.

I'm having trouble isolating the reason this is slower than I expected. I expected 2 to 10 minutes, but I'm seeing more than an hour. There are 57 million records, it seems to be scanning 55 million. At the bottom, I have listed the queries and the counts of each. I suspect the Using temporary; Using filesort. I have added an index on most referenced columns.

I am looking for more ideas to try to resolve this performance. If this is the best I can do, that is also a valid outcome. I am also looking at migrating to full text searching, but I am not ready for that yet.

ANALYZE results in table format:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 PRIMARY RPost index_merge rpost_idx_6, rpost_idx_7, rpost_idx_8 rpost_idx_6, rpost_idx_7, rpost_idx_8 6,6,6 NULL 6176002 55104309.00 100.00 0.02 Using intersect( rpost_idx_6, rpost_idx_7, rpost_idx_8); Using where; Using temporary; Using filesort
11 DEPENDENT SUBQUERY RPost eq_ref PRIMARY PRIMARY 4 test.RPost.Id 1 1.00 100.00 100.00
10 DEPENDENT SUBQUERY RPost eq_ref PRIMARY PRIMARY 4 test.RPost.Id 1 1.00 100.00 100.00
9 DEPENDENT SUBQUERY RPost eq_ref PRIMARY PRIMARY 4 test.RPost.Id 1 1.00 100.00 100.00
8 DEPENDENT SUBQUERY History ref posthistory_idx_1, history_idx_2, history_idx_4 history_idx_1 4 test.RPost.Id 1 2.61 2.98 0.78 Using where
7 DEPENDENT SUBQUERY History ref posthistory_idx_1 ,history_idx_2, history_idx_4 history_idx_1 4 test.RPost.Id 1 2.61 50 96.34 Using where
6 DEPENDENT SUBQUERY ALL NULL NULL NULL NULL 2 2.00 100.00 0.59 Using where
12 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
13 UNION NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
NULL UNION RESULT <union12,13> ALL NULL NULL NULL NULL NULL 2.00 NULL NULL
5 DEPENDENT SUBQUERY ALL NULL NULL NULL NULL 2 2.00 100.00 0.12 Using where
2 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
3 UNION NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL 2.00 NULL NULL

The query:

WITH Common as (
  SELECT '% strings %' AS Value
  UNION
  SELECT '% tofind %'
),
MyRPost (Id, STitle, SBody, Queued, Frozen)
AS
(
SELECT
  RPost.Id,
  (
    SELECT count(*) FROM Common
    WHERE RPost.Title Like Common.Value
  ) AS STitle,
  (
    SELECT count(*) FROM Common
    WHERE RPost.Body Like Common.Value
  ) AS SBody,
  (SELECT count(History.TypeId) FROM History
          WHERE History.PostId = RPost.Id
            AND History.TypeId BETWEEN 1 AND 9
  )%2 AS Queued,
  (SELECT count(History.TypeId) FROM History
          WHERE History.PostId = RPost.Id
            AND History.TypeId BETWEEN 10 AND 19
  )%2 AS Frozen
  FROM RPost
  WHERE RPost.CompletedDate IS NULL
    AND RPost.CancelDate is NULL
    AND RPost.PausedDate IS NULL
)
SELECT STitle, SBody, Id, Queued, Frozen,
  (SELECT RPost.ParentId FROM RPost WHERE MyRPost.Id=RPost.Id) as OriginId,
  (SELECT RPost.PostTypeId FROM RPost WHERE MyRPost.Id=RPost.Id) as Type,
  (SELECT RPost.Title FROM RPost WHERE MyRPost.Id=RPost.Id) as Title
FROM MyRPost
WHERE (Queued+Frozen) = 0
  AND (STitle+SBody) > 1
ORDER BY (STitle+SBody), Id

Table Structure:

CREATE TABLE History (
    Id INT NOT NULL PRIMARY KEY,
    TypeId SMALLINT NOT NULL,
    PostId INT NOT NULL
);
CREATE TABLE RPost (
    Id INT NOT NULL PRIMARY KEY,
    Title varchar(256),
    Body text NULL,
    PostTypeId TINYINT NOT NULL ,
    ParentId INT,
    CompletedDate DATETIME,
    CancelDate DATETIME,
    PausedDate DATETIME
);

Counts:

Query Count
select * from rpost 56 million
remove non-NULL CompletedDate / CancelDate / PausedDate 55 million
now remove (Queued+Frozen) = 0 15 million
finally process likes 8505

Upvotes: 0

Views: 428

Answers (1)

Parfait
Parfait

Reputation: 107587

Consider refactoring the query for conditional aggregation without the nested sub-selects:

WITH Common AS (
  SELECT '% strings %' AS Value
  UNION ALL
  SELECT '% tofind %'
),
MyRPost (Id, STitle, SBody, Queued, Frozen) AS (
  SELECT
     r.Id,
     COUNT(ct.Value) AS STitle,
     COUNT(cb.Value) AS SBody,
     SUM(h.TypeId BETWEEN 1 AND 9) % 2 AS Queued,
     SUM(h.TypeId BETWEEN 10 AND 19) % 2 AS Frozen
  FROM RPost r
  LEFT JOIN Common ct ON r.Title LIKE ct.Value
  LEFT JOIN Common cb ON r.Body LIKE cb.Value
  LEFT JOIN History h ON h.PostId = r.Id
  WHERE r.CompletedDate IS NULL
    AND r.CancelDate is NULL
    AND r.PausedDate IS NULL
  GROUP BY r.Id
  HAVING (Queued + Frozen) = 0
     AND (STitle + SBody) > 1
)

SELECT 
   m.STitle, m.SBody, m.Id, m.Queued, m.Frozen,
   r.ParentId AS OriginId, r.PostTypeId AS Type, r.Title
FROM MyRPost m
LEFT JOIN RPost r ON m.Id = r.Id
ORDER BY (m.STitle + m.SBody), m.Id

Upvotes: 1

Related Questions