Reputation: 6559
My goal is to split a string (using Java or Scala) at all occurences of "yy" that are neither followed nor preceeded by another letter "y". Examples:
"aa-yy-bb" -> ["aa-", "-bb"]
"aa-yyyy-bb" -> ["aa-yyyy-bb"]
"yyy-bb" -> ["yyy-bb"]
"yy-bb" -> ["","-bb"]
"aa-yy-bb-yy" -> ["aa-","-bb-",""]
I ended up at mystring.split("(^|[^y])yy([^y]|$)", -1)
but this solution is invalid since it drops any neighbored characters, e.g., it outputs "aa-yy-bb" -> ["aa", "bb"]
.
Of course this split can be solved by parsing manually, but I wonder whether an (elegant) pattern matching solution exists. Can you find some?
Upvotes: 0
Views: 120
Reputation: 152
According to the documentation for the Pattern class you could use this expression:
\byy\b
\b
A word boundaryWhich only matches yy as a whole word. Even though lookaheads are made for these kinds of tasks a boundary matcher is shorter in this case.
EDIT: This answer doesn't work on all valid inputs.
Upvotes: 1
Reputation: 21975
Lookarounds are made for that
(?<!y)yy(?!y)
(?<!y)
Negative Lookbehindyy
matches the characters yy literally (case sensitive)(?!y)
Negative LookaheadUpvotes: 5