Reputation: 5238
I have the following test string
This is my "te
st" case
with lines for "tes"t"ing" with regex
But as he said "It could be an arbitrary number of words"
And I want to match everything which is between "
as long as it is bound to words. I have the following regexp:
\"([^\"]*)\"
which matches quite well the words of "test"
even if its split apart. Is there a way to find a tes"t"ing
as well a whole word (and not split apart into two words? Trying with the word boundaries \b
(\b\"([^\"]*)\"\b
) doesn't work very well because it won't match the very first "
nor the just mentioned group.
I need it for Java regexp.
UPDATE As a result I need to have
This is my \q{te
st} case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
Upvotes: 3
Views: 114
Reputation: 110685
You could use the regular expression
(?<=\")(?:[a-z]+\"[a-z]+\"[a-z]+|[a-z][^"]+)(?=\")
with the case-indifferent flag i
(or preface with (?i)
).
As seen at the link this regex matches the following three substrings of the text given in the question:
te st
tes"t"ing
It could be an arbitrary number of words
The regex engine performs the following operations:
(?<=\") # match a double-quote in a positive lookbehind
(?: # begin a non-capture group
[a-z]+\" # match 1+ letters, then a double-quote
[a-z]+\" # match 1+ letters, then a double-quote
[a-z]+ # match 1+ letters
| # or
[a-z] # match 1 letter
[^"]+ # match 1+ characters other than a double-quote
) # end non-capture group
(?=\") # match a double-quote in a positive lookahead
Upvotes: 2
Reputation: 626845
You may use
.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
Or, if the matches may span across multiple lines, add (?s)
modifier:
.replaceAll("(?s)\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
See the regex demo .
Details
\B"\b
- a "
that is either at the start of the string or preceded with a non-word char, and that is followed with a word char(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible\b"\B
- a "
that is either at the end of the string or followed with a non-word char, and that is preceded with a word char.The replacement is a backslash ("\\\\"
, note the double literal backslash is necessary in the regex replacement part to insert a real, literal backslash since a backslash is a special char in the replacement pattern), q{
, the Group1 value ($1
) and a }
.
See the Java demo:
String s = "This is my \"te\n\nst\" case\nwith lines for \"tes\"t\"ing\" with regex\nBut as he said \"It could be an arbitrary number of words\"";
System.out.println(s.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}"));
Output:
This is my "te
st" case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
NOTE:
If you also need to match two consecutive double quotes that are not preceded, nor followed with word characters, you can modify the above regular expression the following way:
.replaceAll("(?s)\\B(\"\\b(.*?)\\b\"|\"\")\\B", "\\\\q{$2}")
See the regex demo.
Details
(?s)
- an embedded flag option (equal to Pattern.DOTALL
) that makes .
match line break chars, too\B
- a non-word boundary, here, it means that immediately to the left, there must be a non-word char or start of string (because after \B
, there is a non-word char, "
)(
- start of the first capturing group:
"\b(.*?)\b"
- "
followed with a word char, then Group 2 capturing any zero or more chars, as few as possible, and then a "
that is preceded with a word char (that is why this pattern can't match ""
, since after the first and before the second, there must be a letter, digit or _
)|
- or""
- a ""
substring)
- end of the first capturing group\B
- a non-word boundary, here, it means that immediately to the right, there must be a non-word char or end of string (because before \B
, there is a non-word char, "
).Upvotes: 2
Reputation: 785156
You may use this regex that used lookbehind and lookahead to ensure that previous and next characters is not a non-whitespace character:
(?<!\S)".*?"(?!\S)
Adding helpful comment from OP which worked to solve the problem which was a bit more than what was mentioned in question:
str = str.replaceAll("(?s)(?<!\\S)\"(.*?)\"(?!\\S)", "\\\\q{$1}");
Upvotes: 2