Reputation: 6005
I am trying the regex ^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm
to extract row items from single column tabular list format string.
But the leading spaces are added in the match.
The \s+
operators in the lookahead and lookbehind groups do not help. Refer below:
x = `namespace
-------------------
itm1
itm2
itm3
itm4
(4 rows)
`
console.log(x.match(/^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm)[0].split(/\s+/))
Output is with leading and trailing spaces as separate list elements:
[ '', 'itm1', 'itm2', 'itm3', 'itm4', '' ]
But with console.log(x.match(/^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm)[0].trim().split(/\s+/))
<-- notice the trim()
before the split(..)
, the output is:
[ 'itm1', 'itm2', 'itm3', 'itm4' ]
Why does the \s+
at the end of the lookahead group (?<=[\s]*namespace[\s]*---+\s+)
not remove all the spaces before the desired matching group caught by (.|\s)+
.
Upvotes: 2
Views: 118
Reputation: 626738
The regex engine parses the string from left to right.
The regex searches for the match at the start of string, and does not find the lookbehind pattern, it fails right there, and then the next position is tested, between n
and a
in namespace
. And so on until the newline after the -------------------
.
At the location right after the \n
, the newline char, there is a lookbehind pattern match, \s+
at the end of your lookbehind finds a whitespace required by \s+
pattern. Then, the rest of the pattern finds a match, too. Hence, there are 15 leading spaces in your result.
Use a consuming pattern. That is, use a capturing group. Or, make sure your consuming part starts with a non-whitespace char.
Thus,
const x = "namespace\n-------------------\n itm1\n itm2\n itm3\n itm4\n \n(4 rows)\n";
console.log(
x.match(/(?<=^\s*namespace\s*---+\s+)\S.*?(?=\s*\(\s*\d+\s*rows\))/gms)[0].split(/\s+/)
);
Or, with a capturing group:
const x = "namespace\n-------------------\n itm1\n itm2\n itm3\n itm4\n \n(4 rows)\n";
console.log(
x.match(/^\s*namespace\s*---+\s+(\S.*?)(?=\s*\(\s*\d+\s*rows\))/ms)[1].split(/\s+/)
);
Note on the regexps:
(.|\s)+
with a mere .
pattern, but added the s
flag so that .
could match line break chars. Please never use (.|\s)*
, (.|\n)*
, or (.|[\r\n])*
, these are very inefficient regex patterns\s*
at the start of the positive lookahead so that the trailing whitespaces could be stripped from the match..*?
, in both patterns to match the least amount of chars between two strings.Upvotes: 2