Reputation: 39354
Let's say I have two tables (I'm trying to remove everything irrelevant to the question from the tables and make some sample ones, so bear with me :)
___________________ ________________________
|File | |Content |
|_________________| |______________________|
|ID Primary Key | 1 * |ID Primary Key |
|URL Varcher(255) |---------|FileID Foreign Key |
|_________________| | ref File(ID) |
|FileContent Text |
|______________________|
A File has a url. There may be many Content items corresponding to each File.
I need to create a query using these tables that I'm having some trouble with. I essentially want the query, in simple terms, to say:
"Select the file URL and the sum of the times substring "X" appears in all content entries associated with that file."
I'm pretty good with SQL selects, but I'm not so good with aggregate functions and it's letting me down. Any help is greatly appreciated :)
Upvotes: 1
Views: 799
Reputation: 32094
The query won't be efficient but might give you a hint:
SELECT url, cnt
FROM (
SELECT
f.id,
IFNULL(
SUM(
(LENGTH(c.text) - LENGTH(REPLACE(c.text, f.url, '')))/LENGTH(f.url)
),
0
) as cnt
FROM file c
JOIN content c ON f.id = c.fileid
GROUP BY f.id
) cnts JOIN file USING(id);
To append files that do not have a match in the content table you can UNION ALL
the rest of use LEFT JOIN
in the cnts
subquery.
Upvotes: 1
Reputation: 270775
This solution attempts to use REGEXP
to match the substring. REGEXP
returns 1 if it matches, 0 if not, so SUM()
them up for the total. REGEXP
might seem like overkill, but would allow for more complicated matching than a simple substring.
SELECT
File.ID,
File.URL,
SUM(Content.FileContent REGEXP 'substring') AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
GROUP BY File.ID, File.URL;
The easier method if a more complex match pattern won't ever be needed uses LIKE
and COUNT(*)
instead of SUM()
:
SELECT
File.ID,
File.URL,
COUNT(*) AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
WHERE Content.FileContent LIKE '%substring%'
GROUP BY File.ID, File.URL;
Note the use of LEFT JOIN
, which should produce 0 when there are not actually any entries in Content
.
Upvotes: 1