Dmitry Samoylov
Dmitry Samoylov

Reputation: 1318

How to match repeated patterns with optional spaces between them?

I need to make the following extraction from string in JS:

'Sdfg dfg ldfgh (abc)' => ['abc']
'Sdfg dfg ldfgh (abc) ' => ['abc']
'Sdfg dfg ldfgh (abc) (cde)' => ['abc','cde']
'Sdfg dfg ldfgh (abc)(cde) (efgh)' => ['abc', 'cde', 'efgh']

I need to extract 'tags' in brackets, they may have spaces between them and also the whole string could have space in the end.

I've tried something like /(\(.*\))(\s?\(.*\))+/, but it's not enough to collect all the tags. How can I extract all I need having these optional spaces between and after tags?

Upvotes: 1

Views: 82

Answers (3)

Dacre Denny
Dacre Denny

Reputation: 30370

The following regex should achieve what you require:

/(?<=\()\w*(?=\))/g) 

This pattern roughly translates to:

  1. (?<=() Look for a ( and start matching after that character
  2. \w* "Look for zero or more word characters and match those"
  3. (?=)) "Look for a ) and stop matching prior to this character"
  4. /g "Apply this pattern matching behavior globally across whole input string"

Using this pattern with the match() function will result in zero or more strings being returned in an array, where the contents of those strings correspond to the content between parantheiss pairs of your input string:

const pattern = /(?<=\()\w*(?=\))/g;

console.log('Sdfg dfg ldfgh (abc)'.match(pattern));
console.log('Sdfg dfg ldfgh (abc) '.match(pattern));
console.log('Sdfg dfg ldfgh (abc) (cde)'.match(pattern));
console.log('Sdfg dfg ldfgh (abc)(cde) (efgh)'.match(pattern));

Upvotes: 1

Robby Cornelissen
Robby Cornelissen

Reputation: 97152

Using String.prototype.match() with lookahead and lookbehind assertions:

const extract = (string) => string.match(/(?<=\().+?(?=\))/g);

console.log(extract('Sdfg dfg ldfgh (abc)'));
console.log(extract('Sdfg dfg ldfgh (abc) '));
console.log(extract('Sdfg dfg ldfgh (abc) (cde)'));
console.log(extract('Sdfg dfg ldfgh (abc)(cde) (efgh)'));

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626937

You may match the consecutive parenthesized substrings and then split the result:

var s = 'Sdfg dfg ldfgh (abc)(cde) (efgh)';
var m = s.match(/\([^()]*\)(?:\s*\([^()]*\))*/) || [""];
console.log(m[0].replace(/^\(|\)$/g, '').split(/\)\s*\(/));

The \([^()]*\)(?:\s*\([^()]*\))* pattern will match:

  • \([^()]*\) - a (, 0+ chars other than ( and ) and then )
  • (?:\s*\([^()]*\))* - 0+ repetitions of
    • \s* - 0+ whitespaces
    • \([^()]*\) - same as above, (...) substring.

The .replace(/^\(|\)$/g, '') part will remove the first ( and last ), and .split(/\)\s*\(/) will split with )( having any amount of whitespaces in between.

Upvotes: 0

Related Questions