sccr410
sccr410

Reputation: 151

MySQL query to get most highly matched rows

Have a basic table structure for "domain_keywords"

domain          keyword
golflink.com    golf courses
golflink.com    golf
golflink.com    golf courses near me
...
2ndswing.com    golf clubs
2ndswing.com    used golf clubs
...

So what I am trying to do is a query to take one domain like "golflink.com" and find other domains with matching keywords and get the 10 which have the most matches.

Have tried the following and it works on MySQL 8, but our shared web server (read: cannot upgrade) uses MySQL 5.7 so the "with" clause is not allowed and need an alternate way of doing this query:

WITH t1 AS (
  SELECT keyword, domain
  FROM domain_keywords
  WHERE domain ='golflink.com' 
  GROUP BY domain, keyword
)
SELECT t1.domain AS 'Domain', t2.domain AS 'SimilarDomain', count(t2.keyword) AS 'SharedKeywordsNumber'
FROM t1, domain_keywords t2
WHERE t1.keyword = t2.keyword AND t1.domain != t2.domain
GROUP BY t1.domain, t2.domain 
ORDER BY 3 DESC, 2
LIMIT 10

Upvotes: 1

Views: 47

Answers (2)

Uueerdo
Uueerdo

Reputation: 15951

There is really no need for a CTE or even a subquery for this, also implicit/comma join notation is archaic and explicit joins should be used for all but the most basic throwaway queries. (Also, unless that is a transcription error in your query, that should not run in any version of MySQL; ' is not used for delimiting identifiers; ` is used for that. ' is only used for delimiting strings).

SELECT t1.domain AS `Domain`, t2.domain AS `SimilarDomain`, COUNT(*) AS `SharedKeywordsNumber`
FROM domain_keywords AS t1
INNER JOIN domain_keywords AS t2 ON t1.keyword = t2.keyword AND t1.domain != t2.domain
WHERE t1.domain ='golflink.com' 
GROUP BY t1.domain, t2.domain 
ORDER BY `SharedKeywordsNumber` DESC, `SimilarDomain`
LIMIT 10
;

Another sidenote: The ORDER BY field_ordinal format has been deprecated for years, as it makes queries unnecessarily difficult to read or modify.

Upvotes: 1

Shidersz
Shidersz

Reputation: 17190

A first approach will be using a derived table in the FROM clause similar to the one that you are using inside the WHIT clause, try this:

SELECT
    t1.domain AS 'Domain',
    t2.domain AS 'SimilarDomain',
    count(t2.keyword) AS 'SharedKeywordsNumber'
FROM
    ( SELECT
          keyword, domain
      FROM
          domain_keywords
      WHERE
          domain ='golflink.com' 
      GROUP BY
          domain, keyword ) AS t1
CROSS JOIN
    domain_keywords AS t2
WHERE
    t1.keyword = t2.keyword AND t1.domain != t2.domain
GROUP BY
    t1.domain, t2.domain
ORDER BY
    3 DESC, 2 LIMIT 10

As an improvement, I think you could also replace the CROSS JOIN by one INNER JOIN, in this way (but not 100% sure of this):

SELECT
    t1.domain AS 'Domain',
    t2.domain AS 'SimilarDomain',
    count(t2.keyword) AS 'SharedKeywordsNumber'
FROM
    ( SELECT
          keyword, domain
      FROM
          domain_keywords
      WHERE
          domain ='golflink.com' 
      GROUP BY
          domain, keyword ) AS t1
INNER JOIN
    domain_keywords AS t2 ON t1.keyword = t2.keyword AND t1.domain != t2.domain
GROUP BY
    t1.domain, t2.domain
ORDER BY
    3 DESC, 2 LIMIT 10

Upvotes: 1

Related Questions