Reputation: 43
I've been pounding my head on an expression for over an hour, without results. So it's time to ask for help.
In the following (multi-line) text:
Waltzes vol 15
Waltzes vol. 15
Waltzes vol. A
Waltzes, volume 15
volume 15: waltzes
The portions in bold are the matches of the RegEx I came up with thus far:
(?!^),*\s*(?:vol[ume]*\.*)\s*(?=[0-9A-Z]+)
All are correct, except the last one, which should not be included because it is at the beginning of a line.
As far I can tell from the docs at http://www.regular-expressions.info/refadv.html, the (?!^)
look-around part in the expression should exclude matches found by ,*\s*(?:vol[ume]*\.*)\s*(?=[0-9A-Z]+)
at the beginning of a line, but that doesn't seem to work.
On the other hand, the expression (?!^)op[us]*\.*\s*(?=[0-9]+)
works correctly and does not return a match in the last line of the following text:
Waltzes op. 15
Waltzes opus 15
opus 15: waltzes
What am I doing wrong with the first expression?
Upvotes: 1
Views: 111
Reputation: 51711
Here's why your regex isn't working as expected
<
. It should be (?<!^)
(?:vol[ume]*\.*)
immediately(?m)
(without which ^
would only match start of input)So, your regex with these corrections becomes
(?m),*\s*(?<!^)(?:vol[ume]*\.*)\s*(?=[0-9A-Z]+)
The above works but can be further improved. The use of [ume]*
would also let matches like voleee
, volmeu
etc. Instead of being unbounded with *
, ,
and .
can be made optional with ?
.
(?m),?\s*(?<!^)(?:vol\.?|volume)\s*(?=[0-9A-Z]+)
Upvotes: 1
Reputation: 19480
If you are trying to match vol/vol./volume that is not at the beginning of a line, the following should work:
^.+(vol\.?|volume)
^.+
means match 1 or more characters from the beginning of the line
(vol\.?|volume)
means match vol
followed by an optional .
or match volume
Upvotes: 1
Reputation: 8463
Go with it, instead of fighting it:
^.+\s*(?:vol[ume]*\.*)\s*(?=[0-9A-Z]+)
Force a match at the beginning of the line (^
), followed by one or more characters...
Upvotes: 0