George Milonas
George Milonas

Reputation: 563

mySQL Massive Query Optimization

This question is going out to all the mySQL gurus out there. I've been building out an ad network, and have been working on trying to optimize the main SQL call that grabs the ads. It checks for ads that meet the follow criteria:

  1. Geography (geo)
  2. Operating System (os)
  3. Device Type (device)
  4. Ad Status (enabled)
  5. If the conversion type of the ad falls inside the given conversion types (conversion)
  6. If the number of ad clicks per IP is less then the max frequency allowed (frequency)
  7. The subid is currently not in the blacklist for that ad (ws.nid IS NULL)
  8. If the ad has adometry enabled or not, and not show any ads if the IP/UA is on the adometry list and adometery for that ad is on
  9. Checks the current publishers conversion click through rate compared to the ads max click through rate, and doesnt show that ad if the current ads click through rate is over their max and the publishers click through rate is also over that max
  10. Checks the current ads budget compared to what it should be at (by their max daily budget) and not show any ads if its higher then what it should be.
  11. Does a keyword search depending on the match type set by the ad.

This is the current SQL query I am running:

SELECT
    w.nid as w_nid,
    w.uid as w_uid,
    w.status as w_status,
    w.landing_page as w_landing_page,
    w.starting_bid as w_starting_bid,
    w.daily_budget as w_daily_budget,
    w.revshare as w_revshare,
    w.filters as w_filters,
    w.device as w_device,
    w.os as w_os,
    w.conversion as w_conversion,
    w.max_ctr as w_max_ctr,
    w.frequency as w_frequency,
    w.ad_title as w_ad_title,
    w.ad_desc as w_ad_desc,
    w.ad_728x90 as w_ad_728x90,
    w.ad_300x250 as w_ad_300x250,
    w.ad_160x600 as w_ad_160x600,
    w.match_type as w_match_type,
    wg.nid as wg_nid,
    wg.geo as wg_geo,
    wk.keyword as wk_keyword,
    wk.nid as wk_nid,
    IFNULL(wcs.estimate,0) as wcs_spend,
    IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) as wcs_ctr,
    pss.bid as pss_bid,
    pss.ctr as pss_ctr,
    wci.count as wci_count,
    ws.*
FROM
    websites w
LEFT JOIN
    websites_geos wg
ON
    wg.nid = w.nid
LEFT JOIN
    websites_keywords wk
ON
    wk.nid = w.nid
LEFT JOIN
    api_bucket_website_daily_clicks wcs
ON
    wcs.nid = w.nid AND wcs.date = CURDATE()
LEFT JOIN
    publisher_subid_stats pss
ON
    pss.uniq  = CONCAT(w.nid,'_',:pid
    ,'_',:subid)
LEFT JOIN
    websites_cur_ips wci
ON
    wci.unique = CONCAT(CURDATE(),:ip,w.nid)
LEFT JOIN
    websites_subids ws
ON
    w.nid = ws.nid AND CONCAT(:pid,'_',:subid) = ws.subid
WHERE
    (
            (
                match_type = 0 /* MATCH RON KEYWORDS */
            )
        OR
            (
                wk.keyword = :keyword /* MATCH EXACT KEYWORD */
                AND
                match_type = 1
            )
        OR
            (
                :keyword LIKE CONCAT('%',wk.keyword,'%') /* MATCH PHRASE KEYWORD */
                AND
                match_type = 2
            )
        OR
            (
                :keyword LIKE CONCAT('%',REPLACE(wk.keyword, ' ', '%'),'%') /* MATCH BROAD KEYWORD */
                AND
                match_type = 3
            )
    )
AND
    wg.geo = :geo
AND
    w.os = :os
AND
    w.device = :device
AND
    w.enabled = 1
AND
    w.conversion IN (:conversiontype)
AND
    ((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)
AND
    IFNULL(wci.count,0) < w.frequency
AND
    ws.nid IS NULL  
AND
    ((:adometry = 0) OR (:adometry = 1 AND  w.filters = 0))
AND
    (
        (
            IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) <= w.max_ctr  
        AND 
            IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
        )
        OR
        (
            IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
        )
    )
ORDER BY 
    IFNULL(pss.bid,w.starting_bid) DESC, RAND()

I look at these query and I cry because even though it responds super fast now, we are planning on receiving north of a billion queries a day on it, with well over 500 advertisers and I want to make sure it is as optimized as possible. Also let me know if you require more info!

Upvotes: 0

Views: 135

Answers (2)

deroby
deroby

Reputation: 6002

A couple of remarks that might be worth looking into, then again they might not be beneficial at all so you'll probably want to (thoroughly) test them out first!

  1. I probably would convert the part in the WHERE regarding the match_type into a CASE WHEN structure

  2. You have AND wg.geo = :geo in the WHERE clause, but websites_geo is linked via a left outer join. Either move the 'filter' to the JOIN or make the connection a normal JOIN.

  3. ((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)

    • IF there is an index on daily_budget then you should move the ((:sectoday/86400) to the right
    • I'm assuming that w.dail_budget is always going to be a positive number, hence the left part is always >= 0. I'm also assuming that wcs.estimate is a non-NULLable field, so the only reason there is an IFNULL() here is because of the LEFT OUTER JOIN to api_bucket_website_daily_clicks, right ? If so, then you could move the filter to the LEFT JOIN statement and remove the IFNULL() part
    • I don't have a clue how smart mysql is on this, but it might be useful to calculate (:sectoday/86400) upfront and store it in a variable.
  4. IFNULL(wci.count,0) < w.frequency

    • Similar to above, move to the relevant JOIN part of the query
  5. ((:adometry = 0) OR (:adometry = 1 AND w.filters = 0))

    • Assuming :adometry is a bit, you can shorten this to (:adometry = 0 OR w.filters = 0)
  6. the part

    IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) <= w.max_ctr 
    AND 
    IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) >= IFNULL(pss.ctr,0)
    

    can be shortened to

    IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) BETWEEN IFNULL(pss.ctr,0) AND w.max_ctr 
    

    which might not be faster but seems easier to read IMHO.

  7. When doing nbr 6, and 'mentally' replacing IFNULL(((wcs.conversions/wcs.conversion_impressions)*100),0) by actual_ctr I get the following code

    atcual_ctr BETWEEN IFNULL(pss.ctr,0) AND w.max_ctr  
    OR
    atcual_ctr >= IFNULL(pss.ctr,0)
    

    which IMHO sounds like something that doesn't quite make sense (bug?)

  8. I was going to make a remark about Rand() potentially not being 'renewed' for every record, but it turns out it is in mysql. (for reference: it wouldn't work in MSSQL though)

  9. Curiously you select ws.* but given the requirement on websites_subids I'm assuming you're actually looking for records in websites that have no match in websites_subids and thus all fields of said table will return NULL. I can't but wonder how to interpret this.

As a more generic remark: I don't have a clue how your data will look like regarding active and inactive records (is it 1%-99% or 50&-50% ?), but if performance would become an issue I'd consider removing the field enabled from the table and change your logic to having 2 tables, one with 'obsolete' records and one with active ones. If needed you can still foresee a view that UNION's both tables in case you prefer not to make too many changes to the logic that is being used to maintain this particularity of the data (partitiong might be an option too).

Upvotes: 1

Tim Burch
Tim Burch

Reputation: 1103

A few suggestions: (1) don't join on functions, e.g., CONCAT, if you can help it; (2) in a similar vein, try to cut down on the calculations in your WHERE clauses, e.g., "((:sectoday/86400) * w.daily_budget) >= IFNULL(wcs.estimate,0)," perhaps storing the results of those calculations in an index table/column; (3) consider making a FULLTEXT index out of wk.keyword (I would guess it's not one currently, given the WHERE conditions of your query) and use a single MATCH AGAINST clause instead of the multiple LIKE clauses. Good luck!

Upvotes: 0

Related Questions