Reputation: 3737
I have two tables in my database that look roughly like this:
Movement:
Timestamp visitorID Type X Y
2012-03-02 11:02:30 379 pedestrian 32 46
2012-03-01 12:32:41 654 pedestrian 54 56
2012-03-02 07:02:16 789 pedestrian 39 52
Communication: (Calls made)
Timestamp senderID toID GeneralLocation
2012-03-02 09:02:30 878 674 Grasslands
2012-03-02 11:30:01 456 213 Tundra
2012-03-02 07:02:12 789 654 Mountains
and I ran this query:
SELECT c.senderID,c.timestamp,m.timestamp,m.x,m.y
FROM communication c
JOIN movement m ON c.senderID = m.visitorID
WHERE m.timestamp = (SELECT MIN(mm.timestamp)
FROM movement mm
WHERE mm.timestamp >= c.timestamp);
That basically finds the closest movement timestamp for a given communication timestamp. What I want to do is find the location at which a certain senderID placed a call. Since not every visitorID made a call, it first checks c.senderID = m.visitorID
. So then it's goes through each row of communication
and finds the closest movement timestamp
for each timestamp
in the communication
table.
I then used EXPLAIN
to see if I am using my indexes... and I got this:
Does that mean in the second row that my indexes are not being used correct? The indexes I have in place right now are:
CREATE INDEX timestamp ON DBName.movement (timestamp);
CREATE INDEX ctstamp ON DBName.communication (timestamp);
CREATE INDEX SID_tstamp ON communication (senderID, timestamp);
CREATE INDEX VID_tstamp ON movement (visitorID, timestamp);
So basically, in the Movement
and Communication
tables, I have an index for each timestamp
. And then I have one for each ID
and timestamp
.
My question basically is, what is wrong with my indexes, and how can I modify them so they can be used?
I'm new to SQL, so any help would be greatly appreciated, thank you!!
EDIT
Upvotes: 2
Views: 163
Reputation: 108420
No, it's using an index. The ref
and keylen
tell us that.
I think you may be confused by the Using index
in the Extra column of the other rows in the EXPLAIN output. That means that the query is being satisfied entirely from the index, without a need to visit the underlying data pages.
If you're concerned about performance, it's the correlated subquery we need to look at. (The reference to c.
in that subquery.)
EDIT
I'm thinking that your query should include a predicate on mm.visitorID = c.senderID
(that's based on the normative pattern we usually see in queries like this.
It looks like you want the position (x
,y
) at the earliest timestamp
from m
that's after the timestamp on c
... and you've got a condition (predicate) that matches visitorID to senderID. I think you want to repeat that same condition in the check for "earliest" timestamp. (Otherwise, you could get a timestamp for some other visitorID)
Just that one change might speed up your query a bit, given a suitable index
... ON movement (vistorID, timestamp)
Actually including x
and y
in that index as well would let those values be returned from the index, and you'll get Using index
in your EXPLAIN output.
Taking that change into account, here's a first cut at a re-write that avoids a correlated subquery.
SELECT c.senderID
, c.timestamp
, m.timestamp AS m_timestamp
, m.x
, m.y
FROM ( SELECT mc.senderID
, mc.timestamp
, MIN(mm.timestamp) AS min_timestamp
FROM communication mc --< table alias mc
JOIN movement mm --< table alias mm
ON mm.visitorID = mc.senderID
ON mm.timestamp >= mc.timestamp
GROUP BY mc.senderID, mc.timestamp
ORDER BY mc.senderID, mc.timestamp
) r --< table alias r
JOIN movement m --< table alias m
ON m.visitorID = r.senderID
AND m.timestamp = r.min_timestamp
JOIN communication c
ON c.senderID = r.senderID
AND c.timestamp = r.timestamp
ORDER BY r.senderID, r.timestamp
Here, the inline view (aliased as r
is tasked with getting the earliest timestamp from movement
). That's going to spin the results into a "temporary" table. (I put temporary in quotes; it's temporary in that its around for the duration of the query, but it's called a derived table in the MySQL vernacular.)
There's a GROUP BY
operation; hopefully, we can avoid a Using filesort
operation and have that optimized using a suitable index.
Once we have that, it should just be index lookups into c
and m
.
I'm looking at this again. We don't actually need to do the join to c
, we've already got the values from the inline view.
I say give this a rip, and let me know how big a smoke ball it makes. (No guarantees, because this isn't tested.)
SELECT r.senderID
, r.timestamp
, m.timestamp AS m_timestamp
, m.x
, m.y
FROM ( SELECT mc.senderID
, mc.timestamp
, MIN(mm.timestamp) AS min_timestamp
FROM communication mc -- < table alias mc
JOIN movement mm -- < table alias mm
ON mm.visitorID = mc.senderID
ON mm.timestamp >= mc.timestamp
GROUP BY mc.senderID, mc.timestamp
ORDER BY mc.senderID, mc.timestamp
) r -- < table alias r
JOIN movement m -- < table alias m
ON m.visitorID = r.senderID
AND m.timestamp = r.min_timestamp
ORDER BY r.senderID, r.timestamp
EDIT corrected invalid references c.
to mc.
in the previous two queries.
Upvotes: 1