Reputation: 563
This question is going out to all the mySQL gurus out there. I've been building out an ad network, and have been working on trying to optimize the main SQL call that grabs the ads. It checks for ads that meet the follow criteria:
This is the current SQL query I am running:
SELECT
w.nid as w_nid,
w.uid as w_uid,
w.status as w_status,
w.landing_page as w_landing_page,
w.starting_bid as w_starting_bid,
w.daily_budget as w_daily_budget,
w.revshare as w_revshare,
w.filters as w_filters,
w.device as w_device,
w.os as w_os,
w.conversion as w_conversion,
w.max_ctr as w_max_ctr,
w.frequency as w_frequency,
w.ad_title as w_ad_title,
w.ad_desc as w_ad_desc,
w.ad_728x90 as w_ad_728x90,
w.ad_300x250 as w_ad_300x250,
w.ad_160x600 as w_ad_160x600,
w.match_type as w_match_type,
wg.nid as wg_nid,
wg.geo as wg_geo,
wk.keyword as wk_keyword,
wk.nid as wk_nid,
IFNULL(wcs.estimate,0) as wcs_spend,
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) as wcs_ctr,
pss.bid as pss_bid,
pss.ctr as pss_ctr,
wci.count as wci_count,
ws.*
FROM
websites w
LEFT JOIN
websites_geos wg
ON
wg.nid = w.nid
LEFT JOIN
websites_keywords wk
ON
wk.nid = w.nid
LEFT JOIN
api_bucket_website_daily_clicks wcs
ON
wcs.nid = w.nid AND wcs.date = CURDATE()
LEFT JOIN
publisher_subid_stats pss
ON
pss.uniq = CONCAT(w.nid,'_',:pid
,'_',:subid)
LEFT JOIN
websites_cur_ips wci
ON
wci.unique = CONCAT(CURDATE(),:ip,w.nid)
LEFT JOIN
websites_subids ws
ON
w.nid = ws.nid AND CONCAT(:pid,'_',:subid) = ws.subid
WHERE
(
(
match_type = 0 /* MATCH RON KEYWORDS */
)
OR
(
wk.keyword = :keyword /* MATCH EXACT KEYWORD */
AND
match_type = 1
)
OR
(
:keyword LIKE CONCAT('%',wk.keyword,'%') /* MATCH PHRASE KEYWORD */
AND
match_type = 2
)
OR
(
:keyword LIKE CONCAT('%',REPLACE(wk.keyword, ' ', '%'),'%') /* MATCH BROAD KEYWORD */
AND
match_type = 3
)
)
AND
wg.geo = :geo
AND
w.os = :os
AND
w.device = :device
AND
w.enabled = 1
AND
w.conversion IN (:conversiontype)
AND
((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)
AND
IFNULL(wci.count,0) < w.frequency
AND
ws.nid IS NULL
AND
((:adometry = 0) OR (:adometry = 1 AND w.filters = 0))
AND
(
(
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) <= w.max_ctr
AND
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
)
OR
(
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
)
)
ORDER BY
IFNULL(pss.bid,w.starting_bid) DESC, RAND()
I look at these query and I cry because even though it responds super fast now, we are planning on receiving north of a billion queries a day on it, with well over 500 advertisers and I want to make sure it is as optimized as possible. Also let me know if you require more info!
Upvotes: 0
Views: 135
Reputation: 6002
A couple of remarks that might be worth looking into, then again they might not be beneficial at all so you'll probably want to (thoroughly) test them out first!
I probably would convert the part in the WHERE
regarding the match_type
into a CASE WHEN
structure
You have AND wg.geo = :geo
in the WHERE
clause, but websites_geo
is linked via a left outer join
. Either move the 'filter' to the JOIN
or make the connection a normal JOIN
.
((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)
((:sectoday/86400)
to the rightIFNULL()
here is because of the LEFT OUTER JOIN
to api_bucket_website_daily_clicks, right ? If so, then you could move the filter to the LEFT JOIN
statement and remove the IFNULL()
part(:sectoday/86400)
upfront and store it in a variable. IFNULL(wci.count,0) < w.frequency
((:adometry = 0) OR (:adometry = 1 AND w.filters = 0))
:adometry
is a bit, you can shorten this to (:adometry = 0 OR w.filters = 0)
the part
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) <= w.max_ctr
AND
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
can be shortened to
IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) BETWEEN IFNULL(pss.ctr,0) AND w.max_ctr
which might not be faster but seems easier to read IMHO.
When doing nbr 6, and 'mentally' replacing IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0)
by actual_ctr
I get the following code
atcual_ctr BETWEEN IFNULL(pss.ctr,0) AND w.max_ctr
OR
atcual_ctr >= IFNULL(pss.ctr,0)
which IMHO sounds like something that doesn't quite make sense (bug?)
I was going to make a remark about Rand() potentially not being 'renewed' for every record, but it turns out it is in mysql. (for reference: it wouldn't work in MSSQL though)
Curiously you select ws.*
but given the requirement on websites_subids
I'm assuming you're actually looking for records in websites
that have no match in websites_subids
and thus all fields of said table will return NULL. I can't but wonder how to interpret this.
As a more generic remark: I don't have a clue how your data will look like regarding active and inactive records (is it 1%-99% or 50&-50% ?), but if performance would become an issue I'd consider removing the field enabled
from the table and change your logic to having 2 tables, one with 'obsolete' records and one with active ones. If needed you can still foresee a view that UNION's both tables in case you prefer not to make too many changes to the logic that is being used to maintain this particularity of the data (partitiong might be an option too).
Upvotes: 1
Reputation: 1103
A few suggestions: (1) don't join on functions, e.g., CONCAT, if you can help it; (2) in a similar vein, try to cut down on the calculations in your WHERE clauses, e.g., "((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)," perhaps storing the results of those calculations in an index table/column; (3) consider making a FULLTEXT index out of wk.keyword (I would guess it's not one currently, given the WHERE conditions of your query) and use a single MATCH AGAINST clause instead of the multiple LIKE clauses. Good luck!
Upvotes: 0