cubic lettuce
cubic lettuce

Reputation: 6559

Split a string whenever the split character occurs exactly twice

My goal is to split a string (using Java or Scala) at all occurences of "yy" that are neither followed nor preceeded by another letter "y". Examples:

"aa-yy-bb" -> ["aa-", "-bb"]
"aa-yyyy-bb" -> ["aa-yyyy-bb"]
"yyy-bb" -> ["yyy-bb"]
"yy-bb" -> ["","-bb"]
"aa-yy-bb-yy" -> ["aa-","-bb-",""]

I ended up at mystring.split("(^|[^y])yy([^y]|$)", -1) but this solution is invalid since it drops any neighbored characters, e.g., it outputs "aa-yy-bb" -> ["aa", "bb"].

Of course this split can be solved by parsing manually, but I wonder whether an (elegant) pattern matching solution exists. Can you find some?

Upvotes: 0

Views: 120

Answers (2)

Axel F
Axel F

Reputation: 152

According to the documentation for the Pattern class you could use this expression:

\byy\b
  • \b A word boundary

Which only matches yy as a whole word. Even though lookaheads are made for these kinds of tasks a boundary matcher is shorter in this case.

EDIT: This answer doesn't work on all valid inputs.

Upvotes: 1

Yassin Hajaj
Yassin Hajaj

Reputation: 21975

Lookarounds are made for that

Regex101

(?<!y)yy(?!y)

  • (?<!y) Negative Lookbehind
  • yy matches the characters yy literally (case sensitive)
  • (?!y) Negative Lookahead

Upvotes: 5

Related Questions