ocean800
ocean800

Reputation: 3737

Checking whether indexes are being used in SQL query?

I have two tables in my database that look roughly like this:

Movement:

Timestamp            visitorID Type        X   Y  
2012-03-02 11:02:30  379      pedestrian  32  46
2012-03-01 12:32:41  654      pedestrian  54  56
2012-03-02 07:02:16  789      pedestrian  39  52 

Communication: (Calls made)

Timestamp            senderID     toID GeneralLocation 
2012-03-02 09:02:30  878          674        Grasslands 
2012-03-02 11:30:01  456          213        Tundra 
2012-03-02 07:02:12  789          654        Mountains 

and I ran this query:

SELECT c.senderID,c.timestamp,m.timestamp,m.x,m.y
FROM communication c
JOIN movement m ON c.senderID = m.visitorID
WHERE m.timestamp = (SELECT MIN(mm.timestamp)
                 FROM movement mm 
                 WHERE mm.timestamp >= c.timestamp); 

That basically finds the closest movement timestamp for a given communication timestamp. What I want to do is find the location at which a certain senderID placed a call. Since not every visitorID made a call, it first checks c.senderID = m.visitorID. So then it's goes through each row of communication and finds the closest movement timestamp for each timestamp in the communication table.

I then used EXPLAIN to see if I am using my indexes... and I got this:

enter image description here

Does that mean in the second row that my indexes are not being used correct? The indexes I have in place right now are:

CREATE INDEX timestamp ON DBName.movement (timestamp); 

CREATE INDEX ctstamp ON DBName.communication (timestamp);

CREATE INDEX SID_tstamp ON communication (senderID, timestamp);

CREATE INDEX VID_tstamp ON movement (visitorID, timestamp);

So basically, in the Movement and Communication tables, I have an index for each timestamp. And then I have one for each ID and timestamp.

My question basically is, what is wrong with my indexes, and how can I modify them so they can be used?

I'm new to SQL, so any help would be greatly appreciated, thank you!!

EDIT

enter image description here

Upvotes: 2

Views: 163

Answers (1)

spencer7593
spencer7593

Reputation: 108420

No, it's using an index. The ref and keylen tell us that.

I think you may be confused by the Using index in the Extra column of the other rows in the EXPLAIN output. That means that the query is being satisfied entirely from the index, without a need to visit the underlying data pages.

If you're concerned about performance, it's the correlated subquery we need to look at. (The reference to c. in that subquery.)

EDIT

I'm thinking that your query should include a predicate on mm.visitorID = c.senderID (that's based on the normative pattern we usually see in queries like this.

It looks like you want the position (x,y) at the earliest timestamp from m that's after the timestamp on c... and you've got a condition (predicate) that matches visitorID to senderID. I think you want to repeat that same condition in the check for "earliest" timestamp. (Otherwise, you could get a timestamp for some other visitorID)

Just that one change might speed up your query a bit, given a suitable index

... ON movement (vistorID, timestamp)

Actually including x and y in that index as well would let those values be returned from the index, and you'll get Using index in your EXPLAIN output.


Taking that change into account, here's a first cut at a re-write that avoids a correlated subquery.

SELECT c.senderID
     , c.timestamp
     , m.timestamp  AS m_timestamp
     , m.x
     , m.y
  FROM ( SELECT mc.senderID
              , mc.timestamp
              , MIN(mm.timestamp) AS min_timestamp
           FROM communication mc                    --< table alias mc
           JOIN movement mm                         --< table alias mm
             ON mm.visitorID  = mc.senderID
             ON mm.timestamp >= mc.timestamp
          GROUP BY mc.senderID, mc.timestamp
          ORDER BY mc.senderID, mc.timestamp
       ) r                                          --< table alias r
  JOIN movement m                                   --< table alias m
       ON  m.visitorID   = r.senderID
       AND m.timestamp   = r.min_timestamp
  JOIN communication c
       ON  c.senderID    = r.senderID
       AND c.timestamp   = r.timestamp
 ORDER BY r.senderID, r.timestamp

Here, the inline view (aliased as r is tasked with getting the earliest timestamp from movement). That's going to spin the results into a "temporary" table. (I put temporary in quotes; it's temporary in that its around for the duration of the query, but it's called a derived table in the MySQL vernacular.)

There's a GROUP BY operation; hopefully, we can avoid a Using filesort operation and have that optimized using a suitable index.

Once we have that, it should just be index lookups into c and m.


I'm looking at this again. We don't actually need to do the join to c, we've already got the values from the inline view.

I say give this a rip, and let me know how big a smoke ball it makes. (No guarantees, because this isn't tested.)

  SELECT r.senderID
       , r.timestamp
       , m.timestamp AS m_timestamp
       , m.x
       , m.y
    FROM ( SELECT mc.senderID
                , mc.timestamp
                , MIN(mm.timestamp) AS min_timestamp
             FROM communication mc                     -- < table alias mc
             JOIN movement mm                          -- < table alias mm
               ON mm.visitorID  = mc.senderID
               ON mm.timestamp >= mc.timestamp
            GROUP BY mc.senderID, mc.timestamp
            ORDER BY mc.senderID, mc.timestamp
         ) r                                           -- < table alias r
    JOIN movement m                                    -- < table alias m
         ON  m.visitorID   = r.senderID
         AND m.timestamp   = r.min_timestamp
   ORDER BY r.senderID, r.timestamp

EDIT corrected invalid references c. to mc. in the previous two queries.

Upvotes: 1

Related Questions