MySQL Indexing for specific queries

Question

I have this table

attendance (4M rows at the moment, growing 1.2M per week):

-------------------------------------------------------------
| member_id | attendance_week | attendance_date | event_id  |
------------------------------------------------------------
|  INT (10) |   TINYINT(2)    |   TIMESTAMP     |TINYINT(3) |
-------------------------------------------------------------

attendance indeces:
--------------------------------------------------
| PRIMARY (attendance_week, member_id, event_id) |
| member_id (member_id)                          |
| event_id (event_id, attendance_week)
| total (attendance_week, event_id)              |
--------------------------------------------------

members (400k rows at the moment growing 750 a week):
-------------------------
| member_id |  dept_id  |
-------------------------
|  INT (10) |SMALLINT(5)|
-------------------------

member indeces:
-----------------------
| PRIMARY (member_id) |
| 
-----------------------

Events are weekly, meaning you'll see pairs of member_id and event_id for each week.

Now I have to generate a report of for a certain department each event, current attendance (i.e. if this member already checked-in), and their attendance over at least 4 weeks (i.e. attended / total events for a duration)

This is for the current_attendance part of the report. I fetch all members for a department and LEFT JOIN it with this week's event to get NULL for absences:

SELECT
  m.member_id AS id,
  a.event_id AS attended
FROM
  members AS m
LEFT JOIN
  attendance AS a
  ON
    a.member_id = m.member_id AND
    a.attendance_week = :week AND
    a.event_id = :event
WHERE
  m.dept_id = :dept
GROUP BY
  m.member_id

This is for the attended part of the report. :

SELECT
  a.member_id,
  COUNT(a.event_id)
FROM
  attendance a 
  JOIN
    members m 
    ON 
      a.member_id = m.member_id AND
      m.dept_id = :dept
WHERE
  a.attendance_week BETWEEN :start AND :end
GROUP BY
  a.member_id

I could probably merge these two queries by simply LEFT JOIN-ing the attendance table again on the first query.

And finally for the total part

SELECT
  attendance_week,
  COUNT(DISTINCT event_id)
FROM
  attendance
WHERE
  attendance_week BETWEEN :start AND :end
GROUP BY
  attendance_week

These are the main queries that will be run for these tables. At this moment, the queries run for an average of 150 - 200ms (according to phpMyAdmin) which I think is slow. EXPLAIN tells me that my indeces are being used.

So here are my questions:

Is there any other way that I can revise my indeces and queries to make this faster?
I assume that MySQL has a cache of compiled statements. I'm not talking about the results cache, think PHP opcode vs HTML cache. I already tried SQL_NO_CACHE and I still get the same response time, and query_cache_size is 0. I could swear that I saw phpMyAdmin report the queries at around 800ms once (which is unacceptable) but I don't get them now. How do I measure the true speed of my queries everytime they are run?
Will these be faster if I put these queries in a stored procedure?
Any thoughts for storage methods? The database is currently around 400MB in size. After a year, I don't know, maybe 3GB? Is this scalable? I'm really new when it comes to DBA, I've read master-slave replication and partitioning but I don't know if it is good for this.

If you ever need more info, please comment below. I'll try to provide it. I really did try to do this alone, but given the demands of a huge database (my largest so far) and high performance, I really need some advice :D

Thanks

EDIT

I just realized a terrible flaw in my logic, newly registered members will show up having low attendance performance since the 3rd query doesn't take registration date into account. I have a registration_date column in my members table, is there any way I can incorporate that variable into the query? Or merge all three queries in just once? Since they all return values that are dependent on each user.

EDIT

I've managed to merge the first two queries:

    SELECT
      m.member_id AS id,
      a.event_id AS attended,
      COUNT(b.event_id) AS total_attended
    FROM
      members AS m
      LEFT JOIN
        attendance AS a
        ON
          a.member_id = m.member_id AND
          a.attendance_week = :week AND
          a.event_id = :event
      LEFT JOIN
        attendance AS b
        ON
          b.member_id = m.member_id AND
          b.attendance_week BETWEEN :start AND :end
    WHERE
      m.dept_id = :dept
    GROUP BY
      m.member_id

This query runs for 925ms on the first run and 15ms on subsequent requests.

This is the result of the above query's EXPLAIN

members table:
id:            1
select_type:   SIMPLE
table:         m
type:          ref
possible_keys: dept_id
key:           dept_id
key_len:       3
ref:           const
rows:          88
Extra:         Using where; Using index

attendance table 1 (for the boolean attended part):
id:            1
select_type:   SIMPLE
table:         a
type:          eq_ref
possible_keys: PRIMARY,member_id,event_id,total
key:           PRIMARY
key_len:       6
ref:           const,arms_db.m.member_id,const
rows:          1
Extra:         Using index

attendance table 2 (for the total attendanded part):
id:            1
select_type:   SIMPLE
table:         b
type:          ref
possible_keys: PRIMARY,member_id,total
key:           member_id
key_len:       4
ref:           arms_db.m.member_id
rows:          5
Extra:         Using index

And the EXPLAIN for the last query:

id:            1
select_type:   SIMPLE
table:         attendance
type:          range
possible_keys: PRIMARY,toral
key:           total
key_len:       2
ref:           NULL
rows:          9
Extra:         Using where; Using index for groub-by

Omesh · Accepted Answer

Adding covering or clustered indexes on tables will give you the best performance:

You can add extra index on table member also:

member indeces:(member_id, dept_id)
you can enable Query Cache to cache query output but Query Cache doesn't work with procedures. To measure exact speed of queries you can use mysqlslap client utility .
Queries inside stored procedure won't make much difference in terms of speed but it will save some additional overhead of query parsing and sending output to client.
Distributing data over different servers using sharding or replication will help you in terms of scalability. Partitioning on huge tables will also benefit you.

MySQL Indexing for specific queries

Answers (2)

Related Questions