SQL count the occurrences of words from one table in second table

Question

I am attempting to use a dialect of SQL to parse a set of records. Specifically I want to count the number records that contain various ids (words) occur in my logs. I also want to count the total number of times these ids are present.

The data is present in two different tables:

id_status - A table contain id,status.
request_records - A table which contains request records. An id might be present more than once in a given record.

See SQL Fiddle!

id_status

This table contains the ids and their current status.

id, status
sessions, ACTIVE
visits, DEPRECATED
duration, ACTIVE
...

request_records

Sentence record may contain over 300k records:

request
example.com/api?foo=sessions
example.com/api?bar=session%2Cvisits,foo=sessions
example.com/api?bar=duration,visits
example.com/api?foo=sessions
example.com/api?foo=visits,bar=visits
...

Goal

I want to create a table containing four columns id,status,occurence_count,record_count.

occurence_count: should be the total number of times an id occurs in all the records.
recourd_count: should be the the total number of records a particular id occurs in.

As a result from the above example I would generate the following table:

id, status, occurence_count, recourd_count
sessions,ACTIVE,3,2
visits,DEPRECATED,4,3
duration,ACTIVE,1,1

SQL

Is there a way to construct an SQL query which will return the desired table? I would know how to do it fairly readily in Python or any other language but if possible I'd like to write a SELECT statement to generate this table.

Update: As suggested below, I have attempted to execute the following:

SELECT ids.id, ids.status,
     SUM(length(request) - LENGTH(replace(request, ids.id, ''))) / LENGTH(ids.id) as occurence_count,
     COUNT(reqs.request) AS recourd_count
FROM id_status ids LEFT JOIN
     request_records reqs
     ON find_in_set(ids.id, reqs.requests) > 0
GROUP BY ids.id, ids.status;

However my dialect of SQL does not permit this type of join and returns the following error:

ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.

Added SQL Fiddle with Sample Schema:

CREATE TABLE id_status
    (`id` varchar(32), `status` varchar(32))
;

INSERT INTO id_status
    (`id`, `status`)
VALUES
    ('sessions', 'ACTIVE'),
    ('visits', 'DEPRECATED'),
    ('duration', 'ACTIVE')
;

CREATE TABLE request_records 
    (`request` varchar(500))
;

Steve Chambers · Accepted Answer

Simple version:

SELECT id,
       status,
       (SELECT (SUM(CHAR_LENGTH(request)) -
                SUM(CHAR_LENGTH(REPLACE(request, id, ''))))
               / CHAR_LENGTH(id) FROM request_records) AS occurrence_count,
       (SELECT COUNT(*)
        FROM request_records
        WHERE INSTR(request, id) > 0) AS record_count
FROM id_status

See updated SQL fiddle.

This just looks for the word being present anywhere so for example would blindly count abcsessions or sessionsxyz when looking for sessions. If something more complex is needed (e.g. considering word boundaries), you might need to use a regular expression replacer rather than the REPLACE function - but unfortunately this is one thing MySQL lacks. I once had a go at writing one but it's not trivial - see here.

SQL count the occurrences of words from one table in second table

id_status

request_records

Goal

SQL

Answers (2)

Related Questions