BrainLikeADullPencil
BrainLikeADullPencil

Reputation: 11673

How do I keep the delimiters when splitting a Ruby string?

I have text like:

content = "Do you like to code? How I love to code! I'm always coding." 

I'm trying to split it on either a ? or . or !:

content.split(/[?.!]/)

When I print out the results, the punctuation delimiters are missing.

Do you like to code

How I love to code

I'm always coding

How can I keep the punctuation?

Upvotes: 43

Views: 16799

Answers (5)

sawa
sawa

Reputation: 168269

Answer

Use a positive lookbehind regular expression (i.e. ?<=) inside a parenthesis capture group to keep the delimiter at the end of each string:

content.split(/(?<=[?.!])/)

# Returns an array with:
# ["Do you like to code?", " How I love to code!", " I'm always coding."]

That leaves a white space at the start of the second and third strings. Add a match for zero or more white spaces (\s*) after the capture group to exclude it:

content.split(/(?<=[?.!])\s*/)

# Returns an array with:
# ["Do you like to code?", "How I love to code!", "I'm always coding."]

Additional Notes

While it doesn't make sense with your example, the delimiter can be shifted to the front of the strings starting with the second one. This is done with a positive lookahead regular expression (i.e. ?=). For the sake of anyone looking for that technique, here's how to do that:

content.split(/(?=[?.!])/)

# Returns an array with:
# ["Do you like to code", "? How I love to code", "! I'm always coding", "."]

A better example to illustrate the behavior is:

content = "- the - quick brown - fox jumps"
content.split(/(?=-)/)

# Returns an array with:
# ["- the ", "- quick brown ", "- fox jumps"]

Notice that the square bracket capture group wasn't necessary since there is only one delimiter. Also, since the first match happens at the first character it ends up as the first item in the array.

Upvotes: 67

Bob
Bob

Reputation: 2303

Use partition. An example from the documentation:

"hello".partition("l")         #=> ["he", "l", "lo"]

Upvotes: 9

Chris Heald
Chris Heald

Reputation: 62698

To answer the question's title, adding a capture group to your split regex will preserve the split delimiters:

"Do you like to code? How I love to code! I'm always coding.".split /([?!.])/
  => ["Do you like to code", "?", " How I love to code", "!", " I'm always coding", "."]

From there, it's pretty simple to reconstruct sentences (or do other massaging as the problem calls for it):

s.split(/([?!.])/).each_slice(2).map(&:join).map(&:strip)
 => ["Do you like to code?", "How I love to code!", "I'm always coding."]

The regexes given in other answers do fulfill the body of the question more succinctly, though.

Upvotes: 20

the Tin Man
the Tin Man

Reputation: 160631

I'd use something like:

content.scan(/.+?[?!.]/)
# => ["Do you like to code?", " How I love to code!", " I'm always coding."]

If you want to get rid of the intervening spaces, use:

content.scan(/.+?[?!.]/).map(&:lstrip)
# => ["Do you like to code?", "How I love to code!", "I'm always coding."]

Upvotes: 9

Powers
Powers

Reputation: 19348

The most robust way to do this is with a Natural Language Processing library: Rails gem to break a paragraph into series of sentences

You can also split in groups:

@content.split(/(\?+)|(\.+)|(!+)/)

After splitting into groups, you can join the sentence and delimiter.

@content.split(/(\?+)|(\.+)|(!+)/).each_slice(2) {|slice| puts slice.join}

Upvotes: 2

Related Questions