Lucas
Lucas

Reputation: 307

Group by similarity in Oracle SQL

I have a table with several entries that are mostly equal, except for some tokens in them. These are messages returned from a web service and they must be stored this way.

Given this example query:

SELECT entry_error.error_desc, 
       Count(entry_error.id) 
FROM   entry_message err_entries 
       full outer join entry_error 
                    ON err_entries.id = entry_error.id 
       full outer join error_code 
                    ON entry_error.error_code = error_code.error_code 
WHERE NOT EXISTS(SELECT id_father 
                      FROM   entry_message creator 
                      WHERE  err_entries.id = creator.id_father) 
GROUP  BY entry_error.error_desc;

I get an output like this:

entry_error.error_desc                              count(entry_error.id)                
First Sample Text: 321; Second Sample Text: 123;    1
First Sample Text: 456; Second Sample Text: 654;    1 
First Sample Text: 789; Second Sample Text: 987;    1

But I'd like it to be something like:

entry_error.error_desc                              count(entry_error.id)                
First Sample Text: {0}; Second Sample Text: {1};    3

Is it possible to do this directly in my query?

EDIT: Notice that the messages are just a sample and there are several different ones. They cannot be explicitly written in the query. I need the query to group similar generic messages, (using something like UTL_MATCH, idk), that are X% similar.

Upvotes: 0

Views: 696

Answers (1)

shrek
shrek

Reputation: 887

I think this is close to what you are looking for -

WITH ERR AS (
    SELECT
        'FIRST SAMPLE TEXT: 321; SECOND SAMPLE TEXT: 123;' AS ERR_DESC,
        1 AS ERR_COUNT
    FROM
        DUAL
    UNION
    SELECT
        'FIRST SAMPLE TEXT: 456; SECOND SAMPLE TEXT: 654;' AS ERR_DESC,
        1 AS ERR_COUNT
    FROM
        DUAL
    UNION
    SELECT
        'FIRST SAMPLE TEXT: 789; SECOND SAMPLE TEXT: 987;' AS ERR_DESC,
        1 AS ERR_COUNT
    FROM
        DUAL
    UNION
    SELECT
        'FIRST ERROR MESSAGE: 333; SECOND ERROR MESSAGE: 432;' AS ERR_DESC,
        1 AS ERR_COUNT
    FROM
        DUAL
) SELECT
    REGEXP_REPLACE(ERR_DESC,': [0-9]*;','{0}') AS MSG,
    SUM(ERR_COUNT) AS ERROR_COUNT
FROM
    ERR
GROUP BY
    REGEXP_REPLACE(ERR_DESC,': [0-9]*;','{0}');

Output -

"MSG","ERROR_COUNT"
"FIRST SAMPLE TEXT{0} SECOND SAMPLE TEXT{0}",3
"FIRST ERROR MESSAGE{0} SECOND ERROR MESSAGE{0}",1

Upvotes: 2

Related Questions