Reputation: 23

How do I convert a string to an array with spaces preserved in Ruby?

How do I convert a String: 'Hello world!' to an array: ['Hello', ' ', 'world!'] with all spaces preserved?

I tried to convert the string using the split method with different parameters, but I didn't find the right solution.

Also I didn't find any other method in the documentation (Class: String (Ruby 3.1.0)) suitable for solving this problem.

Upvotes: 2

Answers (3)

user16452228

Reputation:

You can continue to use split and still preserve spaces by using a simple regex with a capture group:

"Hello   World  ! ".split(/( +)/)
#=>  ["Hello", "   ", "World", "  ", "!", " "]

The only catch I'm aware of is that strings starting with a space will result in an array that starts with an empty string:

"  Hello   World  ! ".split(/( +)/)
#=>  ["", "  ", "Hello", "   ", "World", "  ", "!", " "]

IF this is a problem, you can add something like drop_while to the mix:

"  Hello   World  ! ".split(/( +)/).drop_while(&:empty?)
#=>  ["  ", "Hello", "   ", "World", "  ", "!", " "]

Upvotes: 1

Todd A. Jacobs

Reputation: 84343

Use String#scan Instead of String#split

You don't want to use String#split because that won't preserve your spaces. You want to use String#scan or String#partition instead. Using Unicode character properties, you can scan for matches with:

'Hello   world!'.scan /[\p{Alnum}\p{Punct}]+|\p{Space}+/
#=> ["Hello", "   ", "world!"]

You can also use POSIX character classes (pronounced "bracket expressions" in Ruby) to do the same thing if you prefer. For example:

'Hello   world!'.scan /[[:alnum:][:punct:]]+|[[:space:]]+/
#=> ["Hello", "   ", "world!"]

Either of these options will be more robust than solutions that rely on ASCII-only characters or literal whitespace atoms, but if you know your strings won't include other types of characters or encodings then those solutions will work too.

Using Metacharacters for Brevity, Cautiously

If you're looking for brevity in your regular expression, and you're sure you won't need to concern yourself with Unicode characters or explicitly differentiating between non-whitespace characters and punctuation, you can also use the \s and \S metacharacters. For example:

'Hello   world!'.scan /\s+|\S+/
#=> ["Hello", "   ", "world!"]

This is generally less robust than the character properties or bracket expressions above, but is still unambiguous, short, and easy to read. It fits your example, so it's worth mentioning, but the \S metacharacter can match control characters and other unexpected things, so you need to be cautious with it unless you really know your data. For example, your string might contain an invisible NUL or a control character like CTRL-D, in which case \S would catch it and return a Unicode-escaped character:

"\x00".scan /\S+/
#=> ["\u0000"]

?\C-D.scan /\S+/
#=> ["\u0004"]

This is probably not what you'd expect, but given a larger data set this type of thing inevitably happens. The more explicit you can be, the fewer problems you're likely to have with your production data.

Using String#partition

For the very simple use case in your original example, you only have two words separated by whitespace. That means you can also use String#partition to partition on the sequential whitespace. That will split the string into exactly three elements, preserving the whitespace that partitions the words. For example:

'Hello   world!'.partition /\s+/
#=> ["Hello", "   ", "world!"]

While simpler, the partitioning approach won't work as well with longer strings such as:

'Goodbye   cruel world!'.partition /\s+/
#=> ["Goodbye", "   ", "cruel world!"]

so String#scan is going to be a better and more flexible approach for the general use case. However, anytime you want to split a string into three elements, or to preserve the partitioning element itself, #partition can be very handy.

Upvotes: 3

user1934428

Reputation: 22225

It just occured to me, that you could use scan. Assuming that your string is stored in the variable s, and you want to separate space regions and non-space regions, you could do a

s.scan(/[ ]+|[^ ]+/)

which would yield in your case

["Hello", "   ", "world!"]

Upvotes: 4

How do I convert a string to an array with spaces preserved in Ruby?

Answers (3)

Use String#scan Instead of String#split

Using Metacharacters for Brevity, Cautiously

Using String#partition

Related Questions