Gregory Rosenberg
Gregory Rosenberg

Reputation: 17

How to use regex with an array of keywords to replace?

I am trying to create a loop that will replace certain words with their uppercase version. However I cannot seem to get it to work with capture groups as I need to only uppercase words surrounded by whitespace or a start-line marker. If I understand correctly \b is the boundary matcher? The list below is shortened for convenience.

raw_text = 'crEate Alter Something banana'
var lower_text = raw_text.toLowerCase();
var sql_keywords = ['ALTER', 'ANY', 'CREATE']
for (i = 0; i < sql_keywords.length; i++){
    search_key = '(\b)' + sql_keywords[i].toLowerCase() + '(\b)';
    replace_key = sql_keywords[i].toUpperCase();
    lower_text = lower_text.replace(search_key, '$1' + replace_key + '$2');
}

It loops fine but the replace fails. I assume I have formatted it incorrectly but I cannot work out how to correctly format it. To be clear, it is searching for a word surrounded by either line start or a space, then replacing the word with the upper case version while keeping the boundaries preserved.

Upvotes: 0

Views: 715

Answers (2)

Alexandre Senges
Alexandre Senges

Reputation: 1599

You can use the RegExp constructor.

Then make a function:

const listRegexp = list => new RegExp(list.map(word => `(${word})`).join("|"), "gi");

Then use it:

const re = listRegexp(sql_keywords);

Then replace:

const output = raw_text.replace(r, x => x.toUpperCase())

Upvotes: 0

trincot
trincot

Reputation: 350750

Several issues:

  • A backslash inside a string literal is an escape character, so if you intend to have a literal backslash (for the purpose of generating regex syntax), you need to double it
  • You did not create a regular expression. A dynamic regular expression is created with a call to RegExp
  • You would want to provide regex option flags, including g for global, and you might as well ease things by adding the i (case insensitive) flag.
  • There is no reason to make a capture group of a \b as it represents no character from the input. So even if your code would work, then $1 and $2 would just resolve to empty strings -- they serve no purpose.
  • You are casting the input to all lower case, so you will lose the capitalisation on words that are not matched.

It will be easier when you create one regular expression for all at the same time, and use the callback argument of replace:

var raw_text = 'crEate Alter Something banana';
var sql_keywords = ['ALTER','ANY','CREATE'];
var regex = RegExp("\\b(" + sql_keywords.join("|") + ")\\b", "gi");
var result = raw_text.replace(regex, word => word.toUpperCase());

console.log(result);

BTW, you probably also want to match reserved words when they are followed by punctuation, such as a comma. \b will match any switch between alphanumerical and non-alphanumerical, and vice versa, so that seems fine.

Upvotes: 3

Related Questions