John Humphreys
John Humphreys

Reputation: 39354

MySQL Select based on count of substring in a column?

Let's say I have two tables (I'm trying to remove everything irrelevant to the question from the tables and make some sample ones, so bear with me :)

    ___________________         ________________________
    |File             |         |Content               |
    |_________________|         |______________________|
    |ID Primary Key   | 1     * |ID Primary Key        |
    |URL Varcher(255) |---------|FileID Foreign Key    |
    |_________________|         |    ref File(ID)      |
                                |FileContent Text      |
                                |______________________|

A File has a url. There may be many Content items corresponding to each File.

I need to create a query using these tables that I'm having some trouble with. I essentially want the query, in simple terms, to say:

"Select the file URL and the sum of the times substring "X" appears in all content entries associated with that file."

I'm pretty good with SQL selects, but I'm not so good with aggregate functions and it's letting me down. Any help is greatly appreciated :)

Upvotes: 1

Views: 799

Answers (2)

newtover
newtover

Reputation: 32094

The query won't be efficient but might give you a hint:

SELECT url, cnt
FROM (
  SELECT
    f.id,
    IFNULL(
      SUM(
        (LENGTH(c.text) - LENGTH(REPLACE(c.text, f.url, '')))/LENGTH(f.url)
      ),
      0
    ) as cnt
  FROM file c
  JOIN content c ON f.id = c.fileid
  GROUP BY f.id
) cnts JOIN file USING(id);

To append files that do not have a match in the content table you can UNION ALL the rest of use LEFT JOIN in the cnts subquery.

Upvotes: 1

Michael Berkowski
Michael Berkowski

Reputation: 270775

This solution attempts to use REGEXP to match the substring. REGEXP returns 1 if it matches, 0 if not, so SUM() them up for the total. REGEXP might seem like overkill, but would allow for more complicated matching than a simple substring.

SELECT
  File.ID,
  File.URL,
  SUM(Content.FileContent REGEXP 'substring') AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
GROUP BY File.ID, File.URL;

The easier method if a more complex match pattern won't ever be needed uses LIKE and COUNT(*) instead of SUM():

SELECT
  File.ID,
  File.URL,
  COUNT(*) AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
WHERE Content.FileContent LIKE '%substring%'
GROUP BY File.ID, File.URL;

Note the use of LEFT JOIN, which should produce 0 when there are not actually any entries in Content.

Upvotes: 1

Related Questions