Reputation: 43
I want to:
remove all whitespaces unless it's right before or after (0-1 space before and 0-1 after) the predefined keywords (for example: and, or, if then we leave the spaces in " and " or " and" or "and " unchanged)
ignore everything between quotes
I've tried many patterns. The closest I've come up with is pretty close, but it still removes the space after keywords, which I'm trying to avoid.
regex:
\s(?!and|or|if)(?=(?:[^"]*"[^"]*")*[^"]*$)
Test String:
if (ans(this) >= ans({1,2}) and (cond({3,4}) or ans(this) <= ans({5,6})), 7, 8) and {111} > {222} or ans(this) = "hello my friend and or " and(cond({1,2}) $1 123
Ideal result:
if (ans(this)>=ans({1,2}) and (cond({3,4}) or ans(this)<=ans({5,6})),7,8) and {111}>{222} or ans(this)="hello my friend and or " and(cond({1,2})$1123
I then can use str = str.replaceAll
in java to remove those whitespaces. I don't mind doing multiple steps to get to the result, but I am not familiar with regex so kinda stuck.
any help would be appreciated!
Note: I edited the result. Sorry about that. For the space around keywords: shrunk to 1 if there are spaces. Either leave it or add 1 space if it's 0 (I just don't want "or ans" becomes "orans", but "and(cond" becomes "and (cond)" is fine (shrink to 1 space before and 1 space after if exists). Ignore everything between quotes.
Upvotes: 4
Views: 966
Reputation: 626747
You may use
String example = " if (ans(this) >= ans({1,2}) and (cond({3,4}) or ans(this) <= ans({5,6})), 7, 8) and {111} > {222} or ans(this) = \"hello my friend and or \" and(cond({1,2}) $1 123 ";
String rx = "\\s*\\b(and|or|if)\\b\\s*|(\"[^\"]*\")|(\\s+)";
Matcher m = Pattern.compile(rx).matcher(example);
example = m.replaceAll(r -> r.group(3) != null ? "" : r.group(2) != null ? r.group(2) : " " + r.group(1) + " ").trim();
System.out.println( example );
See the Java demo.
The pattern matches
\s*\b(and|or|if)\b\s*
- 0+ whitespaces, word boundary, Group 1: and
, or
, if
, word boundary and then 0+ whitespaces|
- or(\"[^\"]*\")
- Group 2: "
, any 0+ chars other than "
and then a "
|
- or(\s+)
- Group 3: 1+ whitespaces.If Group 3 matches, they are removed, if Group 2 matches, it is put back into the result and if Group 1 matches, it is wrapped with spaces and pasted back. The whole result is .trim()
ed.
Upvotes: 1
Reputation: 43169
You make an intelligent use of capturing groups. The general idea here would be
match_this|or_this|or_even_this|(but_capture_this)
In terms of a regular expression this could be
(?:(?:\s+(?:and|or|if)\s+)|"[^"]+")|(\s+)
You'd then need to replace the match only if the first capturing group is not empty.
(*SKIP*)(*FAIL)
which serves the same purpose).
Upvotes: 2