Steven Guerrero
Steven Guerrero

Reputation: 1056

Regex to match string not inside parentheses

I been struggling to find a Regex that help me match 3 different strings only if they aren't inside parentheses, but so far I have only managed to match it if it's right next to the parentheses, and in this specific situation it doesn't suit me.

To clarify I need to match the Strings "HAVING", "ORDER BY" and "GROUP BY" that aren't contained in any parentheses, no matter if the parentheses contains more than just the string.

In that case:

Select *
from some_table
group by something;

Should match, but:

Select *
from(
   Select *
   from some_other_table
   group by something_else
)

or

Select this, and_this
from(
   Select *
   from some_other_table
   having some_condition
)

shouldn't.

I'm not an expert in Javascript Regex, so any help you could give me would be greatly appreciated.

Upvotes: 0

Views: 287

Answers (1)

melpomene
melpomene

Reputation: 85767

I assume you want to check whether a given SQL query contains HAVING, ORDER BY or GROUP BY at the top level (not within a subquery).

This is complicated by the fact that both parens and words can be contained inside of string literals ('...'), quoted identifiers ("..."), and comments (-- ...).

In the following code I assume that that's all that can "go wrong" (i.e. there are no other quoting constructs) and that no quoted characters are special (in particular, \ isn't treated any differently).

Idea:

  • Remove all quoted constructs like string literals and comments.
  • Remove all parenthesized groups.
  • Check the remaining string for your keywords.

And by "remove" I mean "replace by a space" because otherwise there is the possibility of new tokens being created where there were none before (e.g. hav(...)IN"asdf"g would turn into havINg if parenthesized/quoted parts were just replaced by nothing).

Implementation:

function contains_groupy_bits(sql) {
    sql = sql.replace(/'[^']*'|"[^"]*"|--[^\n]*/g, ' ');
    let tmp;
    while ((tmp = sql.replace(/\([^()]*\)/g, ' ')) !== sql) {
        sql = tmp;
    }
    return /\b(?:having|order\s+by|group\s+by)\b/i.test(sql);
}

const examples = [
    `Select *
    from some_table
    group by something;`,

    `Select *
    from(
       Select *
       from some_other_table
       group by something_else
    )`,

    `Select this, and_this
    from(
       Select *
       from some_other_table
       having some_condition
    )`,

    `select name, count(*) from things
    where mark = '('
    group by name -- )`,
];

for (const ex of examples) {
    console.log("'" + ex + "': " + contains_groupy_bits(ex));
}

Upvotes: 1

Related Questions