Mittenchops
Mittenchops

Reputation: 19664

Very slow multi-table join in sqlite

It's not clear to me why this is such a slow query:

SELECT count(*) FROM PanelsMeta
INNER JOIN Publishers ON PanelsMeta.publisherid = Publishers.id
INNER JOIN Geographies ON Geographies.geo = Publishers.geo;

Using the query analyzer, I see the queries are indexed:

QUERY PLAN
|--SCAN TABLE PanelsMeta USING COVERING INDEX PanPubId
|--SEARCH TABLE Publishers USING INTEGER PRIMARY KEY (rowid=?)
`--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)

The tables are of the following sizes:

sqlite> select count(*) from Publishers;
55
sqlite> select count(*) from PanelsMeta;
2948875
sqlite> select count(*) from Geographies;
37323

What am I doing wrong?

Variations I attempt produce the same query plan and are also tens of minutes slow:

SELECT count(*) FROM Geographies
LEFT JOIN Publishers ON Publishers.geo = Geographies.geo 
LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id;

# QUERY PLAN
# |--SCAN TABLE Geographies USING COVERING INDEX geos
# |--SEARCH TABLE Publishers USING COVERING INDEX PubGeo (geo=?)
# `--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)

SELECT count(*) FROM Publishers
LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id
LEFT JOIN Geographies ON Geographies.geo = Publishers.geo;

# QUERY PLAN
# |--SCAN TABLE Publishers USING COVERING INDEX PubGeo
# |--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)
# `--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)

Update

Schema information is below:

CREATE TABLE PanelsMeta(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  f1 TEXT, 
  f2 TEXT, 
  f3 TEXT, 
  f4 DATETIME,
  f5 DATETIME,
  f6 TEXT, 
  f7 TEXT,
  publisherid INTEGER,
  FOREIGN KEY(publisherid) REFERENCES Publishers(id) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE INDEX ids ON PanelsMeta (id);
CREATE INDEX pp1 ON PanelsMeta (publisherid);
CREATE INDEX pp2 ON PanelsMeta (f1);
CREATE INDEX pp3 ON PanelsMeta (f1,publisherid);

and

CREATE TABLE Publishers(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  geo TEXT,
  f3 TEXT NOT NULL, 
  f4 TEXT NOT NULL,
  f5 TEXT,
  f6 TEXT
);

CREATE INDEX zf3 ON Publishers (f3);
CREATE INDEX zgeo ON Publishers (Geo);
CREATE INDEX zf6 ON Publishers (f6);
CREATE INDEX zid ON Publishers (id);
CREATE INDEX zf3g ON Publishers (f3,geo);
CREATE INDEX zf3gf6 ON Publishers (f3,geo,f6);

and

CREATE TABLE Geographies(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  geo TEXT NOT NULL,
  f3 TEXT NOT NULL,
  f4 TEXT,
  f5 DATETIME,
  f6 TEXT,
  f7 TEXT,
  f7 JSON DEFAULT '{}',
  f8 TEXT
);

CREATE INDEX g ON Geographies (geo);
CREATE INDEX gf3 ON Geographies (f3);

Upvotes: 1

Views: 1265

Answers (1)

Tim Werner
Tim Werner

Reputation: 31

I had the same problem when I tried to INNER JOIN 6 tables with each (1 - 100) rows in it. Each table had only one column.

However my full dataset is 18 GB and about 11 million rows

I solved the problem by putting all data in one table and then using the 'where in' statement. It's strange but it's way faster (about 1 second instead of couple minutes)

Upvotes: 3

Related Questions