curious
curious

Reputation: 21

Ruby string split on more than one character

I have a string, say "Hello_World I am Learning,Ruby". I would like to split this string into each distinct word, what's the best way?

Thanks! C.

Upvotes: 2

Views: 3861

Answers (6)

mu is too short
mu is too short

Reputation: 434595

Just for fun, a Unicode aware version for 1.9 (or 1.8 with Oniguruma):

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|\p{Connector_Punctuation}/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

Or maybe:

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|_/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

The real problem is determining what sequence of characters constitute a "word" in this context. You might want to have a look at the Oniguruma docs for the character properties that are supported, Wikipedia has some notes on the properties as well.

Upvotes: 0

Samnang
Samnang

Reputation: 5606

You could use \W for any non-word character:

"Hello_World I am Learning,Ruby".split /[\W_]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

"Hello_World I am Learning,   Ruby".split /[\W_]+/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Upvotes: 5

Bohdan
Bohdan

Reputation: 8408

String#Scan seems to be an appropriate method for this task

irb(main):018:0> "Hello_World    I am Learning,Ruby".scan(/[a-z]+/i)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

or you might use built-in matcher \w

irb(main):020:0> "Hello_World    I am Learning,Ruby".scan(/\w+/)
=> ["Hello_World", "I", "am", "Learning", "Ruby"]

Upvotes: 1

BlueFish
BlueFish

Reputation: 5135

Whilst the above examples work, I think it's probably better when splitting a string into words to split on characters not considered to be part of any kind of word. To do this, I did this:

str =  "Hello_World I am Learning,Ruby"
str.split(/[^a-zA-Z]/).reject(&:empty?).compact

This statement does the following:

  1. Splits the string by characters that are not in the alphabet
  2. Then rejects anything that is an empty string
  3. And removes all nulls from the array

It would then handle most combination of words. The above examples require you to list out all the characters you want to match against. It's far easier to specify the characters that you would not consider part of a word.

Upvotes: 0

Zoltán Szőcs
Zoltán Szőcs

Reputation: 1141

You can use String.split with a regex pattern as the parameter. Like this:

"Hello_World I am Learning,Ruby".split /[ _,.!?]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Upvotes: 2

Jin
Jin

Reputation: 13463

ruby-1.9.2-p290 :022 > str =  "Hello_World I am Learning,Ruby"
ruby-1.9.2-p290 :023 > str.split(/\s|,|_/)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 

Upvotes: 1

Related Questions