Reputation: 6301
I am trying to extract the bold substring from the following string using Java regex:
music works | with | composer | James Hetfield (musician)
I got started with this code, but this does not work. I am not sure what I am missing:
final Pattern pattern = Pattern.compile("| (.+?) (musician)");
final Matcher matcher = pattern.matcher("music works | with | composer | James Hetfield (musician)");
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract
Thoughts?
Upvotes: 0
Views: 392
Reputation: 67988
([a-zA-Z](?:[a-zA-Z ]*))(?=\(musician\))
You can try this as well.Grab the capture.See demo.
http://regex101.com/r/vR4fY4/19
Upvotes: 0
Reputation: 124275
Based on fact that you used (
and )
to create groups I assume that you know that parenthesis are special characters in regex. But do you know that special characters do not match its literals in text? Notice that (.*)
will not require matched text to start and end with parenthesis.
To let special characters match its literals you need to escape them. You can do it in many ways, like:
\
before them (which needs to be written in String as "\\"
), [
]
to create character class representing only one character - the special one. Similarly |
is special character in regex which represents OR
operator so you also need to escape it.
Another thing is that .+?
despite being reluctant, in | (.+?)
will start matching from first |
found, which means it can also accept other |
until (musician)
will be found. In other words such regex would found this aprt
music works | with | composer | James Hetfield (musician)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So to prevent accepting other pipes (|
) between the one you accept and (musician)
instead of .
use [^|]
- character class which accepts any character except |
.
So try with this pattern:
final Pattern pattern = Pattern.compile("\\| ([^|]+) \\(musician\\)");
UPDATE:
If it is possible that part which should be matched by your regex will not have |
before it (lets say it is at start of your text) then you can simply make \\|
part optional by surrounding it with parenthesis and adding ?
after it to make this part optional. You can also place it in non-capturing-group which will let ([^|]+)
still be group with index 1 which will let your code stay the same (you will not have to change matcher.gorup(1)
to matcher.group(2)
).
So you can try with
final Pattern pattern = Pattern.compile("(?:\\| )?([^|]+) \\(musician\\)");
Upvotes: 5