Timmy Von Heiss
Timmy Von Heiss

Reputation: 2218

Using ruby scan split join with regex

I have this string:

@string = "Hello.My email is [email protected] and my name is James."

I want to add a space specifically between periods and capital letters. I want to change @string to:

"Hello. My email is [email protected] and my name is James."

I have the following code:

@string.scan(/.[A-Z]/)
# => [".M"]

Upvotes: 0

Views: 223

Answers (2)

Ibrahim
Ibrahim

Reputation: 6098

You could use gsub

@string = "Hello.My email is [email protected] and my name is James."
@string.gsub!(/(\.)([A-Z])/, '\1 \2')

Output:

"Hello. My email is [email protected] and my name is James."

Update:

Another good way to do it would be by using a positive lookahead, thanks for @CarySwoveland for suggesting that

@string = "Hello.My email is [email protected] and my name is James."
@string.gsub(/\.(?=[A-Z])/, '. ')

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

To match a . you need to use an escaped dot. You also need to use gsub, not scan as you need to perform a replace operation.

Use

s = "Hello.My email is [email protected] and my name is James."
s = s.gsub(/\.\K(?=[[:upper:]])/, ' ') 

See the Ruby demo. A capturing group variation that still allows consecutive matches:

s = s.gsub(/(\.)(?=[[:upper:]])/, '\1 ')

Or lookbehind one:

s = s.gsub(/(?<=\.)(?=[[:upper:]])/, ' ')

Details

  • \. - a literal dot
  • \K - a match reset operator ((?<=\.) is equal to \.\K in functionality)
  • (?=[[:upper:]]) - a positive lookahead that requires the presence of an uppercase letter immediately to the right of the current location.

In the capturing group based pattern, (\.) forms Group 1 and \1 inserts the value back when replacing.

Here is a way to deal with U.S. like words:

s = "Hello.My email is [email protected] and my name is M.B.S James."
rx = /(\b[[:upper:]](?:\.[[:upper:]])+)\b|\.([[:upper:]])/
puts s.gsub(rx) { |m| 
  m == $~[1] ? $~[1] : ". #{$~[2]}" 
}

See another Ruby demo

Here,

  • \b([[:upper:]](?:\.[[:upper:]])+)\b - a single uppercase letter followed with 1 or more . + 1 or more uppercase letters, captured into Group 1.
  • | - or
  • \.([[:upper:]]) - a dot and the uppercase letter captured into Group 2.

If Group 1 matches, $~[1] (Group 1 value) is inserted back, else . is used for replacement. Note that $~ is the match data object currently in use inside gsub, and $~[N] is Group N value.

Upvotes: 1

Related Questions