Jialer Chew
Jialer Chew

Reputation: 369

Count number of WHERE filters in SQL query using regex

Update: I've updated the test string to cover a case that I've missed.

I'm trying to do count the number of WHERE filters in a query using regex.

So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause.

For example, this query:

WITH cte AS (\nSELECT a,b\nFROM something\nWHERE a>10\n AND b<5)\n, cte2 AS (\n SELECT c,\nd FROM another\nWHERE c>10\nAND d<5)\n SELECT CASE WHEN c1.a=1\nAND c2.c=1 THEN 'yes' ELSE 'no' \nEND,c1.a,c1.b,c2.c,c2.d\nFROM cte c1\nINNER JOIN cte2 c2 ON c1.a = c2.c\nAND c1.b = c2.d\nWHERE c1.a<4 AND DATE(c1)>'2022-01-01'\nAND c2.c>6

-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
  SELECT a,b 
  FROM something 
  WHERE a>10 
    AND b<5
)

, cte2 AS (
  SELECT c,d
  FROM another
  WHERE c>10
    AND d<5
)

SELECT
  CASE
      WHEN c1.a=1 AND c2.c=1 THEN 'yes'
      WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
      ELSE 'no'
  END,
  c1.a,
  c1.b,
  c2.c,
  c2.d
FROM cte c1
INNER JOIN cte2 c2
   ON c1.a = c2.c
  AND c1.b = c2.d
WHERE c1.a<4
  AND DATE(c1)>'2022-01-01'
  AND c2.c>6

should return 7, which are:

  1. WHERE a>10
  2. AND b<5
  3. WHERE c>10
  4. AND d<5
  5. WHERE c1.a<4
  6. AND DATE(c1)>'2022-01-01'
  7. AND c2.c>6

The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE.

The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause.

I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.

I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!

Upvotes: 2

Views: 129

Answers (2)

Ng Boon Khai
Ng Boon Khai

Reputation: 36

I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.

My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.

  1. Group 1 - Starts with JOIN (undesired)
  2. Group 2 - Starts with WHERE (desired)
  3. Group 3 - Starts with CASE (undesired)
(JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))

Note: Feel free to replace WHERE|JOIN|CASE|END to any keyword you want to be the 'stopper' words. enter image description here

All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).

Upvotes: 2

gotqn
gotqn

Reputation: 43646

You can try something like this:

WITH DataSource (parts) AS 
(  
  SELECT REGEXP_MATCHES(
    'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
    E'(?= WHERE)[^)|;]+'
  ,'gmi'
  ) 
)
SELECT SUM
       (
           (length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
            + 1 -- for the where
      )
FROM DataSource

The idea is to match the text after WHERE clause:

enter image description here

and then simply count the ANDs and add one because of the matched WHERE.

Upvotes: 1

Related Questions