Reputation: 589
I know that using index could optimize SQL with GROUP BY
or WHERE
clause. But how to optimize SQL with GROUP BY
and WHERE
? Please see my case.
I have a table to store traffic data and use it to draw a network traffic topology. Below is the table structure :
DROP TABLE IF EXISTS `data`;
CREATE TABLE `data`
(
`sip` varbinary(16) DEFAULT NULL,
`dip` varbinary(16) DEFAULT NULL,
`app` char(96) DEFAULT NULL,
`up` bigint(20) DEFAULT NULL,
`down` bigint(20) DEFAULT NULL,
`dtime` datetime DEFAULT CURRENT_TIMESTAMP,
KEY `dtime` (`dtime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And create index on column dtime
.
The simplified SQL is
SELECT
INET6_NTOA(sip),
INET6_NTOA(dip),
app,
sum(up) AS up,
sum(down) AS down
FROM
`data`
WHERE
`data`.dtime > FROM_UNIXTIME(1583031879)
AND `data`.dtime < FROM_UNIXTIME(1585537477)
GROUP BY
sip,
dip,
app
The table can store about 10,000,000 records for one month and our requirement is to draw a network traffic topology with granularity for Last 30 days, Last 24 hours, Last 1 hour.
Obviously the index dtime
is helpful to query data for last 1 hour or last 24 hours. But when query for last 30 days, it's a full table scan.
In extreme case, querying for 24 hours costs 5s, which is acceptable, while querying for 30 days costs 60s+ which is hard to accept.
Create index for sip,dip,app ? Seems not helpful, for I have to filter data by dtime at first. I googled the various indexing solutions, which may not suit for me.
Any ideas to speed up my SQL ? Or Any ideas to improve table design ? Thank you very much.
Upvotes: 2
Views: 76
Reputation: 142306
Simply put, a "range" in the WHERE
prevents having the index being useful for the GROUP BY
or ORDER BY
. You could add INDEX(sip, dip, app)
to give the Optimizer a choice.
Every table needs a PRIMARY KEY
. Perhaps it could be PRIMARY KEY(sip, dip, app)
? Or just (sip, dip)
? Note, making this the PK would be better than a simple INDEX
.
But the real performance gain for those reports would be to build and maintain a "summary table" with a granularity of 1 hour. One hour would let you efficiently get hour, day, month, etc. See http://mysql.rjweb.org/doc.php/summarytables And, because the summary table would be so much smaller, it won't matter when it needs a table scan.
Are the VARBINARY(16)
some form hash? Or a pair of IP addresses? If they are fixed-length, consider CHAR(16)
.
How big can up
and down
be? Perhaps you don't need 8-byte BIGINT
? (Saving space would help performance some.
MySQL uses only one index per SELECT
. The Optimizer looks at each index (including the PRIMARY KEY
) to estimate which one would be best. In your case, it would weigh the choice between using INDEX(dtime)
for filtering versus using INDEX(sip,dip,app)
to avoid sorting.
If the WHERE
clause filters down to very few rows; the dtime
index would be better, and the Optimizer would probably use it. And vice versa.
Upvotes: 1
Reputation: 2062
could you try to use an index(sip, dip, app)(3 columns index)? I would think this may be helpful.
Upvotes: 0