mahemoff
mahemoff

Reputation: 46479

Ruby - Splitting multiple strings with scan

I have a string like xfooxbar and want to split it into ['foo', 'bar'] using scan. (Before anyone asks why not use split, the real example is more complex, in which I'd need to get the boundary string too, which split discards. I'm asking this question to understand more about how scan works or if there's a similar alternative, as I found this harder than I expected.)

This doesn't work because it keeps scanning until the end of the string:

"xfooxbar".scan(/(?:x)(.*)/)
> [["fooxbar"]]

The problem is that scan doesn't magically stop scanning when it finds the next pattern, and making it non-greedy with (.*?) just makes it empty as there's no endpoint. So we can add an endpoint as the next match:

"xfooxbar".scan(/(?:x)(.*)(?:x)/)
> [["foo"]]

The problem is that scan apparently doesn't match every possible pattern in the string as it keeps a pointer on the current position and won't backtrack. So it's matched on the second boundary and will resume scanning from there (the ?: has no effect on this).

Upvotes: 1

Views: 257

Answers (2)

Sam
Sam

Reputation: 650

Unless I am missing something can't this be done with a simple not x regex?

(I have expanded out the original string to prove the point)

pry(main)> "nonexfooxbarxgreedy\ngreedyxgoose".scan(/x([^x]*)/)
=> [["foo"], ["bar"], ["greedy\ngreedy"], ["goose"]]

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174766

Use positive lookbehind assertion like below.

irb(main):001:0> "xfooxbar".scan(/(?<=x)[^x]*/)
=> ["foo", "bar"]
  • (?<=x) Positive lookbehind asserts that the match must be preceded by a letter x.
  • [^x]* Match any character but not of x, zero or more times.

Upvotes: 2

Related Questions